Calculates design matrices to fit a natural polynomial or trigonometric L-spline as a linear mixed model (S.J. Welham).
||Method for constructing the set of knots (
||Specifies the number of segments between boundaries; default
||Provides the set of internal knots when
||The form of core function to use; (
||Defines the period for trigonometric functions (not required for polynomial splines)|
||Specifies the lower boundary when
||Specifies the upper boundary when
||Variate to use to get an orthogonalized basis; default
||Scaling of the
||The explanatory variate for which the spline values are required|
||Saves the design matrix to define the fixed terms (excluding the constant) for fitting the L-spline|
||Saves the design matrix to define the random terms for fitting the L-spline|
||Saves the internal knots and boundaries used to form the basis for the spline|
||Specifies x-values at which predictions are required|
||Saves the design matrix for the fixed terms (excluding the constant) for the spline at the prediction points|
||Saves the design matrix for the random terms for the spline at the prediction points|
This procedure generates the fixed and random terms required to fit a polynomial or trigonometric L-spline as a linear mixed model, using
REML estimation of the smoothing parameter (Welham et al., 2006). The explanatory variate values at which the spline is to be calculated are specified, in a variate, by the
KMETHOD option specifies how to choose the set of knots for the penalized spline, using settings:
||splits the range of
||defines the set of knots in terms of equally-spaced quantiles of
||indicates that the knots will be supplied, in a variate, by the
The number of segments or quantiles for the
quantile settings is specified by the
NSEGMENTS option. If this is unset, the number is determined automatically as
min([p/4], 35) + 1
(Ruppert 2002) where p is the number of unique values of the variate
X and [r] denotes the integer part of the number r. The lower and upper boundaries for equal segments are specified by the
UPPER options, respectively, taking the minimum and maximum values of
X as their defaults. The set of knots used to form the spline basis can be saved using the
The form of the core functions used to generate the L-spline, i.e. the form that will remain unpenalized, is specified using the
CORE option. The settings
lincossin define trigonometric L-splines, with a (known) period defined by the
PERIOD option. These have the following forms:
||a1 cos(kx) + a2 sin(kx)|
||a0 + a1 cos(kx) + a2 sin(kx)|
|lincossin||a0 + a1 + a2 cos(kx) + a3 sin(kx)|
k = 2 π /
and the ai are unknown coefficients.
quadratic define polynomial splines, as follows:
||natural linear spline with a single constant core function,|
||natural cubic spline with linear core functions, and|
||natural quintic spline with quadratic core functions.|
If the distinct values of
X are used as knots, the function will produce a basis for a polynomial smoothing spline; with fewer knots, the spline is a low-rank approximation equivalent to the O-splines described by Ormerod & Wand (2008). A different parameterization of the basis is used here, but the fitted spline will be the same.
ORTHOGONALIZETO option specifies a variate to use in orthogonalization. The set of random spline terms will then be orthogonal to the fixed terms when evaluated at the specified values. For most data sets, it is recommended to set
ORTHOGONALIZETO to the variate
X (the default). The random terms will then be orthogonal to the fixed terms, and fitted values corresponding to the fixed model will represent the whole of the polynomial trend in the fitted spline. For very large data sets, this calculation can be onerous and can be approximated by making the two bases orthogonal at the knots. No orthogonalization is carried out if
ORTHOGONALIZETO is set to a scalar value (eg.
The L-spline terms are saved as two matrices. The terms required to be fitted as fixed terms can be saved using the
XFIXED parameter. For
CORE=cossin, the constant should be omitted from the model by setting option
CORE=intercept, there are no additional fixed terms, and so
XFIXED is ignored. The terms to be fitted as random can be saved using the
The random terms can be scaled so that, for a random spline matrix
TRACE(Z *+ T(Z)) = NROWS(Z)
This ensures that the average contribution of
Z to the variance of an observation is equal to one, and hence the overall contribution from the term is equal to the spline variance component. This is highly recommended as it removes computational instabilities due to the intrinsic scale of the matrix
K (see Method), and improves interpretability of the spline variance component. This scaling is imposed by default, but can be avoided by setting option
SCALING=none. For L-splines, use of some scaling is strongly recommended, as the unscaled matrix can be so large or small that the associated variance component appears aliased and cannot be estimated.
The L-spline terms required for prediction can be saved using the
PXRANDOM parameters. The
PX parameter defines the set of x-values at which the predictions are to be made.
The L-spline with core functions ∑ τi di(x) and r knots, evaluated on variate
X, minimizes the penalized sum of squares
(y – X τ – Kx c)ʹ R-1 (y – X τ – Kx c) + λ cʹ K c
X is a design matrix containing the core functions evaluated at x, with associated unknown parameters τ;
Kx is a design matrix containing r L-spline basis functions evaluated at the knots, with associated unknown effects c; and
K is an r × r symmetric matrix of L-spline basis functions evaluated at the knots (for details see references in Welham et al. 2006)
The matrix K can be transformed to full rank as
H = Cʹ K C
where the matrix C contains the eigenvectors of XXʹ corresponding to zero eigenvalues, with corresponding transformation of the matrix functions as
Ku = Kx C
This is translated to a set of independent random effects via post-multiplication by H-1/2.
The penalized sum of squares is reformulated as the estimating equations from a mixed model of the form
y = X τ + Z u + e
Z = Kx C Hm-1/2
u is a set of r independently and identically distributed random effects with var(u) = σs2 I
e is a vector of residual errors with var(e) = σ2 R
Fitting this mixed model with known λ set equal to σ2/σs2 produces estimates that minimize the penalized sum of squares. In addition, we can estimate the smoothing parameter using
REML via the variance component σs2. This can be generalized straightforwardly to mixed models with additional fixed and random terms.
The implementation in this procedure allows the random design matrix to be orthogonalized with respect to the fixed design matrix at a given variate. For orthogonalization with respect to the variate
X, this is achieved by using random design matrix
Z* = (I – X(XʹX)-1Xʹ)Z
The entirety of the polynomial trend is then captured by the fixed model. Orthogonalization with respect to a variate
t is calculated as
Z* = Z – X(TʹT)-1Tʹ Kt C Hm-1/2
where T is a matrix holding the core functions evaluated at
t, and Kt is a matrix of L-spline basis functions evaluated at
t. No orthogonalization is carried out if
ORTHOGONALIZETO is set to a scalar value (e.g.
When the random matrix is scaled so that trace(Z*Z*ʹ) is equal to the number of row of Z*,
the average contribution of the spline term to the variance of each unit (σs2 × diag(Z*Z*ʹ)) is equal to σs2. This makes the spline variance component value directly comparable with the residual variance.
Note that the constant function is not included in the fixed design matrix generated by
PENSPLINE, as this term is added automatically to the linear mixed model by the default option setting,
CONSTANT=estimate, in the
The design matrices for use in prediction are calculated by evaluating the same set of basis functions at the predict points.
Input structures must not be restricted.
Ruppert, D. (2002). Selecting the number of knots for penalised splines. Computational & Graphical Statistics, 11, 735-757.
Wand, M.P. & Ormerod, J.T. (2008). On semiparametric regression with O’Sullivan penalised splines. Australian & New Zealand Journal of Statistics, 50, 179-198.
Welham, S.J., Cullis, B.R., Kenward, M.G., Thompson, R. (2006). The analysis of longitudinal data using mixed model L-splines. Biometrics, 62, 392-401.
CAPTION 'LSPLINE examples'; STYLE=meta " generate data " CALCULATE [SEED=23826] x = GRUNIFORM(30; 0; 100) & true = 10 - 0.005*(x-27)**2 - 2*SIN(2*C('pi')*x/40) & [SEED=0] y = true + GRNORMAL(NVAL(x); 0; 1) " pens for plotting lines " PEN 2,3; METH=LINE; SYMBOL=0; LINE=1 " plot data " DGRAPH [TITLE='Observed data'] y; x " generate L-spline basis for core functions 1, x, cos(kx), sin(kx) where k (=2*c('pi')/40) gives period 40 for trigonometric functions and design matrices to predict at 1 unit intervals from 0 to 100 " VARIATE [VALUES=0...100] xpred LSPLINE [NSEGMENTS=10; LOWER=0; UPPER=100; PERIOD=40]\ x; XFIXED=X; XRANDOM=Z; KNOTS=knots;\ PX=xpred; PFIXED=PX; PRANDOM=PZ " plot random basis functions (held in columns of Z) " DGRAPH [TITLE='Random basis functions for L-spline'; KEY=0]\ Z$[*;1...7]; x; PEN=2 " check set of knots used " PRINT knots " fit L-spline as a linear mixed model " VCOMPONENTS [FIXED=X] RANDOM=Z; CONSTRAINT=positive REML y " save fitted values " VKEEP [FITTED=fv] " plot data with fitted values - curve not smooth due to gaps between observations and linear interpolation on plot " DGRAPH ['Data with interpolated L-spline'; KEY=0] y,fv; x; PEN=1,2 " predict at 1-unit intervals (using PX and PZ generated earlier) " VPREDICT [PREDICT=Tp] X,Z; LEVELS=PX,PZ; PARALLEL=*,X " extract and plot predicted curve with data " VTABLE Tp; VARIATE=Vp FRAME [GRID=xy] 1 XAXIS 1; MARKS=knots DGRAPH ['Data with fitted L-spline'; KEY=0] Vp,y; xpred,x; PEN=2,1