Calculates natural cubic spline basis functions for use e.g. in REML
(S.J. Welham).
Options
INKNOTS = variate |
Defines a set of knots to use to construct the spline |
---|---|
METHOD = string token |
Whether to produce a basis suitable for use with independent or correlated random effects; (independent , correlated ); default inde |
ORTHOGONALIZETO = variate |
Variate to use to get an orthogonalized basis; default * i.e. orthogonalization with respect to KNOTS |
Parameters
X = variates |
Values for which the basis functions are calculated |
---|---|
BASIS = pointers |
Non-linear part of spline basis for use as design matrix for random effects in REML analysis |
DBASIS = pointers |
First derivative of BASIS functions |
D2BASIS = pointers |
Second derivative of BASIS functions |
INVCOVARIANCE = symmetric matrices |
Inverse covariance matrix for use with correlated spline random effects |
SECONDDIFFERENCES = matrices |
Scaled second divided difference matrix associated with KNOTS |
KNOTS = variates |
Knots used in construction of basis |
DISTANCES = variates |
Inter-knot distances used in construction of basis |
SCALE = scalars |
Saves the appropriate value for scaling design matrix |
Description
This procedure generates the non-linear part of a basis for natural cubic splines with specified knots, evaluated at the variate X
. The primary purpose of the procedure is to generate bases that can be used to specify and fit cubic smoothing splines within the mixed model framework (see Method section below). The explanatory covariate values at which the basis functions are to be calculated are specified in a variate using parameter X
.
The INKNOTS
option can be used to specify the knot points to use to construct the natural cubic spline basis. The set of distinct values found in this variate are then used as the knots. This option can be used to produce a low-rank approximation to the smoothing spline (but see Methods section below) which uses smaller matrices and hence is faster to fit. If INKNOTS
is unset, the distinct values of X
are used. The knots that are used can be saved using the parameter KNOTS
. The METHOD
option specifies whether the design matrix of basis functions is to be generated for independent random effects, or for correlated random effects. The ORTHOGONALIZETO
option is used to specify a variate to be used in orthogonalization. The set of basis functions produced will then have mean zero and be orthogonal to the specified variate. Setting of this option to the data variate X
is recommended, as then the fitted values corresponding to the fixed model will represent the whole of the linear trend in the fitted spline.
The basis that is generated can be saved in a pointer using the parameter BASIS
. As the scale of this matrix is highly dependent on the inter-knot distances, it is recommended that scaling is used to keep the spline variance component within a reasonable range. If B
is a matrix containing the basis values (as columns), then the recommended scaling provided is such that
TRACE(B *+ T(B)) = NROWS(B),
i.e. so that the average contribution of the spline random term to the variance of a unit is equal to the spline variance component. The SCALE
parameter can be used to save the scalar value
c = SQRT(TRACE(B*+T(B))/NROWS(B))
so that the required scaling can be applied via the calculation
CALCULATE B[]=B[]/c
The first and second derivatives of the basis functions can be saved using parameters DBASIS
and D2BASIS
. Parameter INVCOVARIANCE
can be used to save the inverse variance matrix for random effects when METHOD=correlated
is used. (This corresponds to matrix R defined by Green & Silverman 1994.) Parameters SECONDDIFFERENCES
and DISTANCES
can save the divided second difference matrix and inter-knot distances used in construction of the basis. (These correspond to the matrix Q and vector h of Green & Silverman 1994.)
Options: INKNOTS
, METHOD
, ORTHOGONALIZETO
.
Parameters: X
, BASIS
, DBASIS
, D2BASIS
, INVCOVARIANCE
, SECONDDIFFERENCES
, KNOTS
, DISTANCES
, SCALE
.
Method
Within the mixed model framework, the natural cubic spline g(x
) with r knots evaluated at variate x
can be written as
g(x
) = X τ + B δ
where X=[1 x
] is a design matrix containing 2 linear basis functions, and B is a design matrix containing r-2 non-linear basis functions.
The cubic smoothing spline can be fitted as a natural cubic spline with knots at the distinct covariate values via a linear mixed model
y
= X τ + B δ + ε
with
var(δ) = λ R-1,
where R is a banded symmetric matrix with r-2 rows defined by Green & Silverman (1994) and λ is a function of the smoothing parameter, which can also be estimated using REML
. The matrix Z is the basis saved in a pointer using the BASIS
parameter when METHOD=correlated
is set. The mixed model cubic spline can be fitted using the following commands for data Y
with covariate X
:
NCSPLINE [INKNOTS=X; METHOD=correlated; ORTHOGONAL=X]\
X=X; BASIS=B; INVCOVARIANCE=R; SCALE=scB
CALCULATE B[]=B[]/scB
VCOMPONENTS [FIXED=X] RANDOM=B
VSTRUCTURE [TERM=B] FACTOR=B; MODEL=FIXED; INVERSE=R
REML Y
It is more straightforward to transform the basis B to
Z = B R-0.5
and fit the model
y = X τ + Z u + ε
with
var(u) = λ I,
i.e. a model with independent random effects. The transformed basis is obtained by using the default setting METHOD=independent
, and the basis is saved by the using the BASIS
parameter. The mixed model spline can then simply be fitted using the commands:
NCSPLINE [INKNOTS=X; ORTHOG=X] X=X; BASIS=Z; SCALE=scZ
CALCULATE Z[]=Z[]/scZ
VCOMPONENTS [FIXED=X] RANDOM=Z
REML Y
See Verbyla et al. (1999) for further details of mixed model cubic smoothing splines.
This procedure can also be used to generate bases using a set of knots that differs from the set of distinct covariate values. This can be used as an approximation to reduce the computing load associated with cubic splines with many knots, but the fitted spline no longer has the optimality properties associated with cubic smoothing splines. The goodness of the approximation depends on the number and position of knots chosen. Ruppert et al. (2003) recommend using
r = min(n/4, 35)
and placing the r knots at the i/(r+1) quantiles of the covariate for i=1…r.
Action with RESTRICT
The variates contained in BASIS
, DBASIS
and D2BASIS
pointers are restricted in the same way as the X
parameter. Values in the units excluded by the restriction are set to missing. If there is any restriction on the KNOTS
option, knot values are calculated only from the included subset.
References
Green, P.J. & Silverman, B.W. (1994). Non-parametric Regression and Generalised Linear Models. London: Chapman & Hall.
Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). Semi-parametric Regression. Cambridge: Cambridge University Press.
Verbyla, A.P., Cullis, B.R., Kenward, M.G. & Welham, S.J. (1999). The analysis of designed experiments and longitudinal data using smoothing splines (with discussion). Applied Statistics, 48, 269-311.
See also
Directive: VCOMPONENTS
.
Procedures: SPLINE
, LSPLINE
, PENSPLINE
, PSPLINE
, RADIALSPLINE
, TENSORSPLINE
.
Function: SSPLINE
.
Commands for: Calculations and manipulation, Regression analysis, REML analysis of linear mixed models.
Example
CAPTION 'NCSPLINE example',\ !t('Fit mixed model cubic spline curves for grouped data',\ 'A rotational experiment with plots having a range of soil',\ 'phosphate levels provides measurements of weight of beet,',\ '%sugar in the beet and soil phosphate level in each of',\ 'four successive years.'); STYLE=meta,plain READ Beetwt,%sugar,SoilP 7.23 18.5 5.4 7.69 18.0 5.4 24.64 20.1 7.8 26.67 19.8 8.0 39.78 19.5 18.0 44.98 19.3 15.6 41.59 19.7 30.4 44.08 19.8 33.8 48.37 19.4 50.4 44.76 19.0 51.0 49.73 18.6 44.0 51.54 18.5 40.2 47.69 19.0 57.2 45.66 19.4 65.0 50.18 18.6 27.0 47.69 18.7 30.0 8.82 13.8 5.6 1.81 13.9 4.8 15.82 14.5 10.2 9.04 14.0 8.6 24.41 15.0 21.6 22.60 14.1 17.2 26.45 15.2 36.4 20.80 15.3 37.2 28.30 14.2 44.4 22.60 14.7 44.4 14.24 13.5 41.0 35.94 15.6 30.2 25.54 15.8 60.8 27.13 15.6 47.0 31.42 15.6 27.0 34.13 15.4 29.0 19.90 16.1 3.0 20.60 16.0 2.0 34.70 16.7 6.2 35.40 16.4 6.2 46.80 17.1 19.8 40.50 16.9 17.2 43.00 16.9 29.6 48.60 17.1 28.0 47.30 17.0 42.8 41.30 17.1 46.2 44.30 17.0 36.6 47.60 16.6 40.0 45.60 17.0 42.2 44.60 17.0 52.0 44.00 17.2 23.4 40.10 16.6 28.0 14.35 16.1 4.0 14.35 15.5 3.8 26.71 16.6 8.0 25.12 16.4 6.4 33.39 17.2 18.2 33.79 16.2 14.8 36.68 17.0 35.0 33.69 16.8 29.6 34.98 17.0 37.2 35.78 17.0 40.0 42.06 17.2 39.6 38.77 17.3 36.8 40.66 17.3 52.4 37.28 17.2 45.6 34.68 17.3 22.0 32.59 17.2 26.0 : FACTOR [LEVELS=4; VALUES=16(1...4)] Year CALCULATE Sugar = Beetwt * %sugar / 100 " Get basis for independent random effects " NCSPLINE [ORTHOG=SoilP] X=SoilP; BASIS=Z; SCALE=scZ " Scale pointer so that variance component interpretable " CALCULATE Z[]=Z[]/scZ " Fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=Z; CON=positive REML [PRINT=mon,#] Sugar; FITTED=S1 " Get basis for correlated random effects " NCSPLINE [METHOD=corr; ORTHOG=SoilP] X=SoilP; BASIS=B; INVCOV=R; SCALE=scB " Fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=B; CONSTRAINTS=positive VSTRUCTURE [TERM=B] FACTOR=B; MODEL=fixed; INVERSE=R REML [PRINT=mon,#] Sugar; FITTED=S2 " Compare with spline model using in-built functions " VCOMPONENTS [FIXED=Year*SoilP; SPLINE=SoilP] REML [PRINT=mon,#] Sugar; FITTED=S3 " Verify that fitted splines are all the same " PRINT S1,S2,S3 " Reduce knots " NCSPLINE [INKNOTS=!(0,5,10...65)] X=SoilP; BASIS=rZ; SCALE=scrZ " Scale design matrix " CALCULATE rZ[] = rZ[] / scrZ " Re-fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ; CON=positive REML [PRINT=mon,#] Sugar; FITTED=S4 " Get basis functions for PREDICT " NCSPLINE [INKNOTS=!(0,5,10...65)] X=!(0...65); BASIS=Zpred " Important: must use same scaling (and orthogonalisation) as for original design matrix " CALCULATE Zpred[] = Zpred[] / scrZ " Get predictions " VPREDICT [PREDICTIONS=P1] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\ PARALLEL=*,SoilP,* PRINT P1; DECIMALS=2; FIELD=6 MATRIX [ROWS=66; COLUMNS=4] MP1; VALUES=P1 VARIATE [NVALUES=66] Pred[1,2,3,4] CALCULATE Pred[] = MP1$[*;1,2,3,4] " Set up for graphics " PEN 5...8; METHOD=line; COLOUR='black','red','limegreen','blue';\ SYMBOL=0 YAXIS 1; LOWER=-3; UPPER=11 " Examine predicted interpolated curve and data " DGRAPH [TITLE='Interpolated curves with data: common spline']\ Sugar,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8 " Examine fitted curve and data " DGRAPH [TITLE='Fitted curves with data: common spline']\ Sugar,S3; SoilP; PEN=Year+0,4 " Verify fitted points with interpolated curve " DGRAPH [TITLE='Fitted & interpolated splines: common spline model']\ S3,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8 " Investigate adding spline interaction with Year " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ+rZ.Year; CONSTRAINTS=positive REML [PRINT=mon,#; MAX=50] Sugar; FITTED=S6 " Get predictions " VPREDICT [PREDICTIONS=P2] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\ PARALLEL=*,SoilP,* PRINT P2; DECIMALS=2; FIELD=6 MATRIX [ROWS=66; COLUMNS=4] MP2; VALUES=P2 VARIATE [NVALUES=66] Pred2[1,2,3,4] CALCULATE Pred2[] = MP2$[*;1,2,3,4] " Examine predicted interpolated curve and data " DGRAPH [TITLE='Interpolated curves with data: separate splines']\ Sugar,Pred2[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8