Calculates natural cubic spline basis functions for use e.g. in `REML`

(S.J. Welham).

### Options

`INKNOTS` = variate |
Defines a set of knots to use to construct the spline |
---|---|

`METHOD` = string token |
Whether to produce a basis suitable for use with independent or correlated random effects; (`independent` , `correlated` ); default `inde` |

`ORTHOGONALIZETO` = variate |
Variate to use to get an orthogonalized basis; default `*` i.e. orthogonalization with respect to `KNOTS` |

### Parameters

`X` = variates |
Values for which the basis functions are calculated |
---|---|

`BASIS` = pointers |
Non-linear part of spline basis for use as design matrix for random effects in `REML` analysis |

`DBASIS` = pointers |
First derivative of `BASIS` functions |

`D2BASIS` = pointers |
Second derivative of `BASIS` functions |

`INVCOVARIANCE` = symmetric matrices |
Inverse covariance matrix for use with correlated spline random effects |

`SECONDDIFFERENCES` = matrices |
Scaled second divided difference matrix associated with `KNOTS` |

`KNOTS` = variates |
Knots used in construction of basis |

`DISTANCES` = variates |
Inter-knot distances used in construction of basis |

`SCALE` = scalars |
Saves the appropriate value for scaling design matrix |

### Description

This procedure generates the non-linear part of a basis for natural cubic splines with specified knots, evaluated at the variate `X`

. The primary purpose of the procedure is to generate bases that can be used to specify and fit cubic smoothing splines within the mixed model framework (see Method section below). The explanatory covariate values at which the basis functions are to be calculated are specified in a variate using parameter `X`

.

The `INKNOTS`

option can be used to specify the knot points to use to construct the natural cubic spline basis. The set of distinct values found in this variate are then used as the knots. This option can be used to produce a low-rank approximation to the smoothing spline (but see Methods section below) which uses smaller matrices and hence is faster to fit. If `INKNOTS`

is unset, the distinct values of `X`

are used. The knots that are used can be saved using the parameter `KNOTS`

. The `METHOD`

option specifies whether the design matrix of basis functions is to be generated for independent random effects, or for correlated random effects. The `ORTHOGONALIZETO`

option is used to specify a variate to be used in orthogonalization. The set of basis functions produced will then have mean zero and be orthogonal to the specified variate. Setting of this option to the data variate `X`

is recommended, as then the fitted values corresponding to the fixed model will represent the whole of the linear trend in the fitted spline.

The basis that is generated can be saved in a pointer using the parameter `BASIS`

. As the scale of this matrix is highly dependent on the inter-knot distances, it is recommended that scaling is used to keep the spline variance component within a reasonable range. If `B`

is a matrix containing the basis values (as columns), then the recommended scaling provided is such that

`TRACE(B *+ T(B)) = NROWS(B),`

i.e. so that the average contribution of the spline random term to the variance of a unit is equal to the spline variance component. The `SCALE`

parameter can be used to save the scalar value

`c = SQRT(TRACE(B*+T(B))/NROWS(B))`

so that the required scaling can be applied via the calculation

`CALCULATE B[]=B[]/c`

The first and second derivatives of the basis functions can be saved using parameters `DBASIS`

and `D2BASIS`

. Parameter `INVCOVARIANCE`

can be used to save the inverse variance matrix for random effects when `METHOD=correlated`

is used. (This corresponds to matrix *R* defined by Green & Silverman 1994.) Parameters `SECONDDIFFERENCES`

and `DISTANCES`

can save the divided second difference matrix and inter-knot distances used in construction of the basis. (These correspond to the matrix *Q* and vector h of Green & Silverman 1994.)

Options: `INKNOTS`

, `METHOD`

, `ORTHOGONALIZETO`

.

Parameters: `X`

, `BASIS`

, `DBASIS`

, `D2BASIS`

, `INVCOVARIANCE`

, `SECONDDIFFERENCES`

, `KNOTS`

, `DISTANCES`

, `SCALE`

.

### Method

Within the mixed model framework, the natural cubic spline g(`x`

) with *r* knots evaluated at variate `x`

can be written as

g(`x`

) = X τ + B δ

where X=[1 `x`

] is a design matrix containing 2 linear basis functions, and B is a design matrix containing *r*-2 non-linear basis functions.

The cubic smoothing spline can be fitted as a natural cubic spline with knots at the distinct covariate values via a linear mixed model

`y`

= X τ + B δ + ε

with

var(δ) = λ R^{-1},

where R is a banded symmetric matrix with *r*-2 rows defined by Green & Silverman (1994) and λ is a function of the smoothing parameter, which can also be estimated using `REML`

. The matrix Z is the basis saved in a pointer using the `BASIS`

parameter when `METHOD=correlated`

is set. The mixed model cubic spline can be fitted using the following commands for data `Y`

with covariate `X`

:

`NCSPLINE [INKNOTS=X; METHOD=correlated; ORTHOGONAL=X]\`

` X=X; BASIS=B; INVCOVARIANCE=R; SCALE=scB`

`CALCULATE B[]=B[]/scB`

`VCOMPONENTS [FIXED=X] RANDOM=B`

`VSTRUCTURE [TERM=B] FACTOR=B; MODEL=FIXED; INVERSE=R`

`REML Y`

It is more straightforward to transform the basis B to

Z = B R^{-0.5}

and fit the model

y = X τ + Z u + ε

with

var(u) = λ I,

i.e. a model with independent random effects. The transformed basis is obtained by using the default setting `METHOD=independent`

, and the basis is saved by the using the `BASIS`

parameter. The mixed model spline can then simply be fitted using the commands:

`NCSPLINE [INKNOTS=X; ORTHOG=X] X=X; BASIS=Z; SCALE=scZ`

`CALCULATE Z[]=Z[]/scZ`

`VCOMPONENTS [FIXED=X] RANDOM=Z`

`REML Y`

See Verbyla *et al*. (1999) for further details of mixed model cubic smoothing splines.

This procedure can also be used to generate bases using a set of knots that differs from the set of distinct covariate values. This can be used as an approximation to reduce the computing load associated with cubic splines with many knots, but the fitted spline no longer has the optimality properties associated with cubic smoothing splines. The goodness of the approximation depends on the number and position of knots chosen. Ruppert *et al*. (2003) recommend using

*r* = min(*n*/4, 35)

and placing the *r* knots at the *i*/(*r*+1) quantiles of the covariate for *i*=1…*r*.

### Action with `RESTRICT`

The variates contained in `BASIS`

, `DBASIS`

and `D2BASIS`

pointers are restricted in the same way as the `X`

parameter. Values in the units excluded by the restriction are set to missing. If there is any restriction on the `KNOTS`

option, knot values are calculated only from the included subset.

### References

Green, P.J. & Silverman, B.W. (1994). *Non-parametric Regression and Generalised Linear Models*. London: Chapman & Hall.

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). *Semi-parametric Regression*. Cambridge: Cambridge University Press.

Verbyla, A.P., Cullis, B.R., Kenward, M.G. & Welham, S.J. (1999). The analysis of designed experiments and longitudinal data using smoothing splines (with discussion). *Applied Statistics*, 48, 269-311.

### See also

Directive: `VCOMPONENTS`

.

Procedures: `SPLINE`

, `LSPLINE`

, `PENSPLINE`

, `PSPLINE`

, `RADIALSPLINE`

, `TENSORSPLINE`

.

Function: `SSPLINE`

.

Commands for: Calculations and manipulation, Regression analysis, REML analysis of linear mixed models.

### Example

CAPTION 'NCSPLINE example',\ !t('Fit mixed model cubic spline curves for grouped data',\ 'A rotational experiment with plots having a range of soil',\ 'phosphate levels provides measurements of weight of beet,',\ '%sugar in the beet and soil phosphate level in each of',\ 'four successive years.'); STYLE=meta,plain READ Beetwt,%sugar,SoilP 7.23 18.5 5.4 7.69 18.0 5.4 24.64 20.1 7.8 26.67 19.8 8.0 39.78 19.5 18.0 44.98 19.3 15.6 41.59 19.7 30.4 44.08 19.8 33.8 48.37 19.4 50.4 44.76 19.0 51.0 49.73 18.6 44.0 51.54 18.5 40.2 47.69 19.0 57.2 45.66 19.4 65.0 50.18 18.6 27.0 47.69 18.7 30.0 8.82 13.8 5.6 1.81 13.9 4.8 15.82 14.5 10.2 9.04 14.0 8.6 24.41 15.0 21.6 22.60 14.1 17.2 26.45 15.2 36.4 20.80 15.3 37.2 28.30 14.2 44.4 22.60 14.7 44.4 14.24 13.5 41.0 35.94 15.6 30.2 25.54 15.8 60.8 27.13 15.6 47.0 31.42 15.6 27.0 34.13 15.4 29.0 19.90 16.1 3.0 20.60 16.0 2.0 34.70 16.7 6.2 35.40 16.4 6.2 46.80 17.1 19.8 40.50 16.9 17.2 43.00 16.9 29.6 48.60 17.1 28.0 47.30 17.0 42.8 41.30 17.1 46.2 44.30 17.0 36.6 47.60 16.6 40.0 45.60 17.0 42.2 44.60 17.0 52.0 44.00 17.2 23.4 40.10 16.6 28.0 14.35 16.1 4.0 14.35 15.5 3.8 26.71 16.6 8.0 25.12 16.4 6.4 33.39 17.2 18.2 33.79 16.2 14.8 36.68 17.0 35.0 33.69 16.8 29.6 34.98 17.0 37.2 35.78 17.0 40.0 42.06 17.2 39.6 38.77 17.3 36.8 40.66 17.3 52.4 37.28 17.2 45.6 34.68 17.3 22.0 32.59 17.2 26.0 : FACTOR [LEVELS=4; VALUES=16(1...4)] Year CALCULATE Sugar = Beetwt * %sugar / 100 " Get basis for independent random effects " NCSPLINE [ORTHOG=SoilP] X=SoilP; BASIS=Z; SCALE=scZ " Scale pointer so that variance component interpretable " CALCULATE Z[]=Z[]/scZ " Fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=Z; CON=positive REML [PRINT=mon,#] Sugar; FITTED=S1 " Get basis for correlated random effects " NCSPLINE [METHOD=corr; ORTHOG=SoilP] X=SoilP; BASIS=B; INVCOV=R; SCALE=scB " Fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=B; CONSTRAINTS=positive VSTRUCTURE [TERM=B] FACTOR=B; MODEL=fixed; INVERSE=R REML [PRINT=mon,#] Sugar; FITTED=S2 " Compare with spline model using in-built functions " VCOMPONENTS [FIXED=Year*SoilP; SPLINE=SoilP] REML [PRINT=mon,#] Sugar; FITTED=S3 " Verify that fitted splines are all the same " PRINT S1,S2,S3 " Reduce knots " NCSPLINE [INKNOTS=!(0,5,10...65)] X=SoilP; BASIS=rZ; SCALE=scrZ " Scale design matrix " CALCULATE rZ[] = rZ[] / scrZ " Re-fit common spline " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ; CON=positive REML [PRINT=mon,#] Sugar; FITTED=S4 " Get basis functions for PREDICT " NCSPLINE [INKNOTS=!(0,5,10...65)] X=!(0...65); BASIS=Zpred " Important: must use same scaling (and orthogonalisation) as for original design matrix " CALCULATE Zpred[] = Zpred[] / scrZ " Get predictions " VPREDICT [PREDICTIONS=P1] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\ PARALLEL=*,SoilP,* PRINT P1; DECIMALS=2; FIELD=6 MATRIX [ROWS=66; COLUMNS=4] MP1; VALUES=P1 VARIATE [NVALUES=66] Pred[1,2,3,4] CALCULATE Pred[] = MP1$[*;1,2,3,4] " Set up for graphics " PEN 5...8; METHOD=line; COLOUR='black','red','limegreen','blue';\ SYMBOL=0 YAXIS 1; LOWER=-3; UPPER=11 " Examine predicted interpolated curve and data " DGRAPH [TITLE='Interpolated curves with data: common spline']\ Sugar,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8 " Examine fitted curve and data " DGRAPH [TITLE='Fitted curves with data: common spline']\ Sugar,S3; SoilP; PEN=Year+0,4 " Verify fitted points with interpolated curve " DGRAPH [TITLE='Fitted & interpolated splines: common spline model']\ S3,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8 " Investigate adding spline interaction with Year " VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ+rZ.Year; CONSTRAINTS=positive REML [PRINT=mon,#; MAX=50] Sugar; FITTED=S6 " Get predictions " VPREDICT [PREDICTIONS=P2] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\ PARALLEL=*,SoilP,* PRINT P2; DECIMALS=2; FIELD=6 MATRIX [ROWS=66; COLUMNS=4] MP2; VALUES=P2 VARIATE [NVALUES=66] Pred2[1,2,3,4] CALCULATE Pred2[] = MP2$[*;1,2,3,4] " Examine predicted interpolated curve and data " DGRAPH [TITLE='Interpolated curves with data: separate splines']\ Sugar,Pred2[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8