NCSPLINE procedure

Calculates natural cubic spline basis functions for use e.g. in REML (S.J. Welham).

Options

INKNOTS = variate Defines a set of knots to use to construct the spline
METHOD = string token Whether to produce a basis suitable for use with independent or correlated random effects; (independent, correlated); default inde
ORTHOGONALIZETO = variate Variate to use to get an orthogonalized basis; default * i.e. orthogonalization with respect to KNOTS

Parameters

X = variates Values for which the basis functions are calculated
BASIS = pointers Non-linear part of spline basis for use as design matrix for random effects in REML analysis
DBASIS = pointers First derivative of BASIS functions
D2BASIS = pointers Second derivative of BASIS functions
INVCOVARIANCE = symmetric matrices Inverse covariance matrix for use with correlated spline random effects
SECONDDIFFERENCES = matrices Scaled second divided difference matrix associated with KNOTS
KNOTS = variates Knots used in construction of basis
DISTANCES = variates Inter-knot distances used in construction of basis
SCALE = scalars Saves the appropriate value for scaling design matrix

Description

This procedure generates the non-linear part of a basis for natural cubic splines with specified knots, evaluated at the variate X. The primary purpose of the procedure is to generate bases that can be used to specify and fit cubic smoothing splines within the mixed model framework (see Method section below). The explanatory covariate values at which the basis functions are to be calculated are specified in a variate using parameter X.

The INKNOTS option can be used to specify the knot points to use to construct the natural cubic spline basis. The set of distinct values found in this variate are then used as the knots. This option can be used to produce a low-rank approximation to the smoothing spline (but see Methods section below) which uses smaller matrices and hence is faster to fit. If INKNOTS is unset, the distinct values of X are used. The knots that are used can be saved using the parameter KNOTS. The METHOD option specifies whether the design matrix of basis functions is to be generated for independent random effects, or for correlated random effects. The ORTHOGONALIZETO option is used to specify a variate to be used in orthogonalization. The set of basis functions produced will then have mean zero and be orthogonal to the specified variate. Setting of this option to the data variate X is recommended, as then the fitted values corresponding to the fixed model will represent the whole of the linear trend in the fitted spline.

The basis that is generated can be saved in a pointer using the parameter BASIS. As the scale of this matrix is highly dependent on the inter-knot distances, it is recommended that scaling is used to keep the spline variance component within a reasonable range. If B is a matrix containing the basis values (as columns), then the recommended scaling provided is such that

TRACE(B *+ T(B)) = NROWS(B),

i.e. so that the average contribution of the spline random term to the variance of a unit is equal to the spline variance component. The SCALE parameter can be used to save the scalar value

c = SQRT(TRACE(B*+T(B))/NROWS(B))

so that the required scaling can be applied via the calculation

CALCULATE B[]=B[]/c

The first and second derivatives of the basis functions can be saved using parameters DBASIS and D2BASIS. Parameter INVCOVARIANCE can be used to save the inverse variance matrix for random effects when METHOD=correlated is used. (This corresponds to matrix R defined by Green & Silverman 1994.) Parameters SECONDDIFFERENCES and DISTANCES can save the divided second difference matrix and inter-knot distances used in construction of the basis. (These correspond to the matrix Q and vector h of Green & Silverman 1994.)

Options: INKNOTS, METHOD, ORTHOGONALIZETO.

Parameters: X, BASIS, DBASIS, D2BASIS, INVCOVARIANCE, SECONDDIFFERENCES, KNOTS, DISTANCES, SCALE.

Method

Within the mixed model framework, the natural cubic spline g(x) with r knots evaluated at variate x can be written as

g(x) = X τ + B δ

where X=[1 x] is a design matrix containing 2 linear basis functions, and B is a design matrix containing r-2 non-linear basis functions.

The cubic smoothing spline can be fitted as a natural cubic spline with knots at the distinct covariate values via a linear mixed model

y = X τ + B δ + ε

with

var(δ) = λ R-1,

where R is a banded symmetric matrix with r-2 rows defined by Green & Silverman (1994) and λ is a function of the smoothing parameter, which can also be estimated using REML. The matrix Z is the basis saved in a pointer using the BASIS parameter when METHOD=correlated is set. The mixed model cubic spline can be fitted using the following commands for data Y with covariate X:

NCSPLINE [INKNOTS=X; METHOD=correlated; ORTHOGONAL=X]\

            X=X; BASIS=B; INVCOVARIANCE=R; SCALE=scB

CALCULATE B[]=B[]/scB

VCOMPONENTS [FIXED=X] RANDOM=B

VSTRUCTURE [TERM=B] FACTOR=B; MODEL=FIXED; INVERSE=R

REML Y

It is more straightforward to transform the basis B to

Z = B R-0.5

and fit the model

y = X τ + Z u + ε

with

var(u) = λ I,

i.e. a model with independent random effects. The transformed basis is obtained by using the default setting METHOD=independent, and the basis is saved by the using the BASIS parameter. The mixed model spline can then simply be fitted using the commands:

NCSPLINE [INKNOTS=X; ORTHOG=X] X=X; BASIS=Z; SCALE=scZ

CALCULATE Z[]=Z[]/scZ

VCOMPONENTS [FIXED=X] RANDOM=Z

REML Y

See Verbyla et al. (1999) for further details of mixed model cubic smoothing splines.

This procedure can also be used to generate bases using a set of knots that differs from the set of distinct covariate values. This can be used as an approximation to reduce the computing load associated with cubic splines with many knots, but the fitted spline no longer has the optimality properties associated with cubic smoothing splines. The goodness of the approximation depends on the number and position of knots chosen. Ruppert et al. (2003) recommend using

r = min(n/4, 35)

and placing the r knots at the i/(r+1) quantiles of the covariate for i=1…r.

Action with RESTRICT

The variates contained in BASIS, DBASIS and D2BASIS pointers are restricted in the same way as the X parameter. Values in the units excluded by the restriction are set to missing. If there is any restriction on the KNOTS option, knot values are calculated only from the included subset.

References

Green, P.J. & Silverman, B.W. (1994). Non-parametric Regression and Generalised Linear Models. London: Chapman & Hall.

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). Semi-parametric Regression. Cambridge: Cambridge University Press.

Verbyla, A.P., Cullis, B.R., Kenward, M.G. & Welham, S.J. (1999). The analysis of designed experiments and longitudinal data using smoothing splines (with discussion). Applied Statistics, 48, 269-311.

See also

Directive: VCOMPONENTS.

Procedures: SPLINE, LSPLINE, PENSPLINE, PSPLINE, RADIALSPLINE, TENSORSPLINE.

Function: SSPLINE.

Commands for: Calculations and manipulation, Regression analysis, REML analysis of linear mixed models.

Example

CAPTION     'NCSPLINE example',\
         !t('Fit mixed model cubic spline curves for grouped data',\
            'A rotational experiment with plots having a range of soil',\
            'phosphate levels provides measurements of weight of beet,',\
            '%sugar in the beet and soil phosphate level in each of',\
            'four successive years.'); STYLE=meta,plain
READ        Beetwt,%sugar,SoilP
 7.23 18.5  5.4   7.69 18.0  5.4  24.64 20.1  7.8  26.67 19.8  8.0
39.78 19.5 18.0  44.98 19.3 15.6  41.59 19.7 30.4  44.08 19.8 33.8
48.37 19.4 50.4  44.76 19.0 51.0  49.73 18.6 44.0  51.54 18.5 40.2
47.69 19.0 57.2  45.66 19.4 65.0  50.18 18.6 27.0  47.69 18.7 30.0
 8.82 13.8  5.6   1.81 13.9  4.8  15.82 14.5 10.2   9.04 14.0  8.6
24.41 15.0 21.6  22.60 14.1 17.2  26.45 15.2 36.4  20.80 15.3 37.2
28.30 14.2 44.4  22.60 14.7 44.4  14.24 13.5 41.0  35.94 15.6 30.2
25.54 15.8 60.8  27.13 15.6 47.0  31.42 15.6 27.0  34.13 15.4 29.0
19.90 16.1  3.0  20.60 16.0  2.0  34.70 16.7  6.2  35.40 16.4  6.2
46.80 17.1 19.8  40.50 16.9 17.2  43.00 16.9 29.6  48.60 17.1 28.0
47.30 17.0 42.8  41.30 17.1 46.2  44.30 17.0 36.6  47.60 16.6 40.0
45.60 17.0 42.2  44.60 17.0 52.0  44.00 17.2 23.4  40.10 16.6 28.0
14.35 16.1  4.0  14.35 15.5  3.8  26.71 16.6  8.0  25.12 16.4  6.4
33.39 17.2 18.2  33.79 16.2 14.8  36.68 17.0 35.0  33.69 16.8 29.6
34.98 17.0 37.2  35.78 17.0 40.0  42.06 17.2 39.6  38.77 17.3 36.8
40.66 17.3 52.4  37.28 17.2 45.6  34.68 17.3 22.0  32.59 17.2 26.0 :
FACTOR      [LEVELS=4; VALUES=16(1...4)] Year
CALCULATE   Sugar = Beetwt * %sugar / 100
" Get basis for independent random effects "
NCSPLINE    [ORTHOG=SoilP] X=SoilP; BASIS=Z; SCALE=scZ
" Scale pointer so that variance component interpretable "
CALCULATE   Z[]=Z[]/scZ
" Fit common spline "
VCOMPONENTS [FIXED=Year*SoilP] RANDOM=Z; CON=positive
REML        [PRINT=mon,#] Sugar; FITTED=S1
" Get basis for correlated random effects "
NCSPLINE    [METHOD=corr; ORTHOG=SoilP] X=SoilP; BASIS=B; INVCOV=R; SCALE=scB
" Fit common spline "
VCOMPONENTS [FIXED=Year*SoilP] RANDOM=B; CONSTRAINTS=positive
VSTRUCTURE  [TERM=B] FACTOR=B; MODEL=fixed; INVERSE=R
REML        [PRINT=mon,#] Sugar; FITTED=S2
" Compare with spline model using in-built functions "
VCOMPONENTS [FIXED=Year*SoilP; SPLINE=SoilP]
REML        [PRINT=mon,#] Sugar; FITTED=S3
" Verify that fitted splines are all the same "
PRINT       S1,S2,S3
" Reduce knots "
NCSPLINE    [INKNOTS=!(0,5,10...65)] X=SoilP; BASIS=rZ; SCALE=scrZ
" Scale design matrix "
CALCULATE   rZ[] = rZ[] / scrZ
" Re-fit common spline "
VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ; CON=positive
REML        [PRINT=mon,#] Sugar; FITTED=S4
" Get basis functions for PREDICT "
NCSPLINE    [INKNOTS=!(0,5,10...65)] X=!(0...65); BASIS=Zpred
" Important: must use same scaling (and orthogonalisation)
  as for original design matrix "
CALCULATE   Zpred[] = Zpred[] / scrZ
" Get predictions "
VPREDICT    [PREDICTIONS=P1] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\
            PARALLEL=*,SoilP,*
PRINT       P1; DECIMALS=2; FIELD=6
MATRIX      [ROWS=66; COLUMNS=4] MP1; VALUES=P1
VARIATE     [NVALUES=66] Pred[1,2,3,4]
CALCULATE   Pred[] = MP1$[*;1,2,3,4]
" Set up for graphics "
PEN         5...8; METHOD=line; COLOUR='black','red','limegreen','blue';\
            SYMBOL=0
YAXIS       1; LOWER=-3; UPPER=11
" Examine predicted interpolated curve and data "
DGRAPH      [TITLE='Interpolated curves with data: common spline']\ 
            Sugar,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8
" Examine fitted curve and data "
DGRAPH      [TITLE='Fitted curves with data: common spline']\ 
            Sugar,S3; SoilP; PEN=Year+0,4
" Verify fitted points with interpolated curve "
DGRAPH      [TITLE='Fitted & interpolated splines: common spline model']\
            S3,Pred[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8
" Investigate adding spline interaction with Year "
VCOMPONENTS [FIXED=Year*SoilP] RANDOM=rZ+rZ.Year; CONSTRAINTS=positive
REML        [PRINT=mon,#; MAX=50] Sugar; FITTED=S6
" Get predictions "
VPREDICT    [PREDICTIONS=P2] SoilP,rZ,Year; LEVELS=!(0...65),Zpred,*;\
            PARALLEL=*,SoilP,*
PRINT       P2; DECIMALS=2; FIELD=6
MATRIX      [ROWS=66; COLUMNS=4] MP2; VALUES=P2
VARIATE     [NVALUES=66] Pred2[1,2,3,4]
CALCULATE   Pred2[] = MP2$[*;1,2,3,4]
" Examine predicted interpolated curve and data "
DGRAPH      [TITLE='Interpolated curves with data: separate splines']\ 
            Sugar,Pred2[]; SoilP,4(!(0...65)); PEN=Year,5,6,7,8
Updated on March 7, 2019

Was this article helpful?