Defines a variance structure for random effects in a REML
model.
Options
TERMS = formula |
Model terms for which the covariance structure is to be defined |
---|---|
FORMATION = string token |
Whether the structure is formed by direct product construction or by definition of the whole matrix (direct , whole ); default dire |
CORRELATE = string token |
Whether to impose correlation across the model terms if several are specified (none , positivedefinite , unrestricted ); default none |
CINITIAL = variate or symmetric matrix |
Initial values for covariance matrix across terms |
COORDINATES = matrix or variates |
Coordinates of the data points to be used in calculating distance-based models |
Parameters
MODELTYPE = string tokens |
Type of covariance model associated with the term(s), or with individual factors in the term(s) if FORMATION=direct (identity , fixed , AR , MA , ARMA , power , boundedlinear , circular , spherical , linearvariance , banded , correlation , antedependence , unstructured , diagonal , uniform , FA , FAequal ) default iden |
---|---|
ORDER = scalar |
Order of model |
HETEROGENEITY = string token |
Heterogeneity for correlation matrices (none , outside ); default none |
METRIC = string token |
How to calculate distances when MODELTYPE=power (cityblock , squared , euclidean ); default city |
FACTOR = factors |
Factors over which to form direct products |
MATRIX = symmetric matrices, diagonal matrices or pointers |
Defines matrix values for a term or the factors when MODELTYPE=fixed |
INVERSE = symmetric matrices, diagonal matrices or pointers |
Define values for matrix inverses (instead of the fixed matrices themselves) when MODELTYPE=fixed |
DISTANCES = symmetric matrices |
Symmetric matrix of pre-formed distances to be used in distance-based models of order one |
COORDINATES = matrices, variates or pointers |
Specifies coordinates of each factor level to be used in calculating distance-based models |
INITIAL = scalars, variates, matrices, symmetric matrices or pointers |
Initial parameter values for each correlation matrix (supplied in the structures appropriate for the model concerned) |
CONSTRAINTS = texts |
Texts containing strings none , fix or positive to define constraints for the parameters in each model |
EQUALITYCONSTRAINTS = variates |
Non-zero values in the variate indicate groups of parameters whose values are to be constrained to be equal |
Description
VSTRUCTURE
can be used to define the form of covariance structure for any term in the random model defined for REML
by VCOMPONENTS
. By default, the effects for each random term are assumed to be independent with common variance σj2 for term j, that is, the random term has covariance matrix σj2I. VSTRUCTURE
is used to define correlation between random effects within terms, to allow a changing variance within a term, and to define correlations between different random terms. These models are particularly useful when fitting linear models to repeated measurements or spatial data and for random coefficient regression.
VSTRUCTURE
can only be used after VCOMPONENTS
has been used to define the fixed and random models. It can be used more than once to define different structures for different random terms. The information is accumulated within Genstat, and it will all be used by subsequent REML
commands. You can check on the model and covariance structures defined at any time by using the VSTATUS
directive. To cancel a covariance structure for a term you simply need to use VSTRUCTURE
to change the model back to the scaled identity matrix σj2I. To cancel all covariance structures you can give a new VCOMPONENTS
command.
For a random term constructed from more than one factor, the covariance matrix can be formed either as a single matrix for the whole term, or as the direct product of several matrices corresponding to the factors. Consider an analysis of repeated measurements where data has been taken weekly from each subject, and one of several different treatments has been applied to each subject. It is likely that data taken from the same subject will be correlated, with correlation decreasing over time, but that subjects will be independent. This corresponds to an I ⊗ C covariance structure, where the identity matrix I corresponds to the independent subjects, and the covariance matrix C corresponds to the correlated measurements over time within subjects. If we take C to be an auto-regressive process of order 1, this can be defined and fitted as follows:
VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODELTYPE=I,AR; ORDER=1;\
FACTOR=Subject,Week
REML Y
The TERM
option is used to specify the term to which the covariance structure is to be applied. For each factor in the term you can then specify the covariance model to be applied (see below for list of available models). However, it is not necessary to specify factors for which the default identity model is required, so the following is an equivalent specification:
VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODELTYPE=AR; ORDER=1; FACTOR=Week
To cancel the covariance structure for the term, a null setting is sufficient:
VSTRUCTURE [TERM=Subject.Week]
Although the covariance structure for each term here is of the form Gj = I, the variance matrix for the data is of the form
V = σ2 ( Σj γjZjGjZj′ + I )
In this case the random subject term generates correlations that are equal across all the times within subjects. It is important to remember that including a random term in the model will generate uniform correlations between units with the same values of the random factor(s). Retaining these terms in the model as well as specifying a correlated structure may be appropriate for some data sets, but can sometimes lead to difficulties in parameter estimation.
The possible settings for the MODELTYPE
parameter, generating symmetric covariance matrices C (Ci, j = Cj, i for all i, j), are listed below. Where more than one model order can be used, the default is shown in bold and can be changed by using the ORDER
option. For the AR
, MA
, ARMA
, power
and banded
models, the order is the same as the number of parameters to be fitted. For the banded
, correlation
, ante-dependence
and unstructured
models, the order is the number of non-zero off-diagonal bands in the matrix. For the FAequal and FA models, the order is the number of columns in the matrix Λ.
identity |
identity matrix | Ci, i = 1, Ci,j = 0, for i≠j |
fixed |
fixed matrix | Ci, j specified |
AR |
auto-regressive order 1 or 2 |
Ci, i = 1 |
MA |
moving average order 1 or 2 |
Ci, i = 1 |
ARMA |
auto-regressive moving-average order 1 |
Ci, i = 1 |
power |
based on distance order 1 or 2 |
Ci, i = 1 |
boundedlinear |
based on distance order 1 |
Ci, j = 1 – d/φ for d ≤ φ, |
circular |
based on distance order 1 |
Ci, j = 1 – (2/π) {(d/φ)√(1-(d/φ)2) + sin-1(d/φ)} for d ≤ φ, |
spherical |
based on distance order 1 |
Ci, j = 1 – 1.5 (d/φ) + 0.5 (d/φ)3 for d ≤ φ, |
linearvariance |
based on distance order 1 |
Ci, j = 1 – 2φ d / max(d) |
banded |
equal bands |
Ci, i = 1 |
correlation |
general correlation matrix |
Ci, i = 1 |
uniform |
uniform matrix | Ci, j = θ for all i,j |
diagonal |
diagonal matrix |
Ci, i = θi |
antedependence |
ante-dependence model |
C-1 = UD-1U′ |
unstructured |
general covariance matrix |
Ci, j = θij , |
FA |
factor analytic |
C = ΛΛ′ + Ψ |
FAequal |
factor analytic with common variance |
C = ΛΛ′ + Ψ |
Initial parameter values can be specified using the INITIAL
parameter. For most models, the number of initial values required is the number of parameters, and default values will be generated. However, for unstructured
models, a full covariance matrix of initial values must be given, and for the correlation
model a full correlation matrix must be provided. For the ante-dependence
model, either a full covariance matrix can be provided, or a pointer to a U and a D-1 matrix of the correct forms. For the FA
and FAequal
models, a pointer must be used to give the initial Λ and Ψ matrices, otherwise default initial values are generated. The FAequal
model can be used to get initial values for the FA
model. Initial values are required for these models because the algorithm may not converge when many parameters are fitted if the starting values are not realistic. Initial values might be generated from covariance matrices estimated by fitting simpler models, or from residuals from a null variance model.
A missing value in the initial values is taken to mean that the value is inestimable and it will be fixed at a small value for the analysis. Alternatively, a parameter can be fixed at its initial value using the CONSTRAINTS
parameter. The codes (not case sensitive and able to be abbreviated) may take value fix
to indicate the parameter is to be fixed at its initial value, positive
to indicate it is to remain positive or none
to indicate no constraints. The default is a positive constraint or no constraint depending on context; for example scaling parameters are always constrained to remain positive. The EQUALITYCONSTRAINTS
parameter allows you to constrain some of the parameters to have the same value. The variate that it specifies contains a zero value if there is no constraint, and an identical integer value for any set of parameters whose values are to be equal. So, a variate containing the values (0,1,2,1,2) would constrain the second parameter to be equal to the fourth parameter, and the third parameter to be equal to the fifth parameter.
It may sometimes be desirable to allow for unequal variances for the models defined in terms of correlation matrices: that is, for the AR
, MA
, ARMA
, uniform
, power
, boundedlinear
, circular
, spherical
, linearvariance
, banded
and correlation
models. This can be done by setting option HETEROGENEITY=outside
. This means a diagonal matrix D of standard errors will be applied to the correlation matrix C to generate a matrix D½CD½. In this case, a number of extra parameters (equal to the number of effects in the factor or term) should be added to the vector of initial values. These models allow investigation of a structured correlation pattern for changing variances and are particularly useful in the analysis of repeated measurements data when variance increases over time. For example, to allow for changing variance over time in our example above, we can specify
VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week
VSTRUCTURE [TERM=Subject.Week] MODELTYPE=AR; ORDER=1;\
FACTOR=Week; HETEROGENEITY=outside
REML Y
In some circumstances, you may wish to define a single model to apply to the whole term, instead of using the direct product form illustrated above. In this case, you should set option FORM=whole
. Note that, when a term consists of a single factor, it is not necessary to set the FACTOR
option.
With MODELTYPE=fixed
, you must either use the MATRIX
option to specify the values of the covariance matrix C, or the INVERSE
option to specify the inverse matrix. The values of the matrix or its inverse can be supplied as diagonal matrices or symmetric matrices. In addition, values for the inverse matrix can be supplied in sparse form as a pointer. The output from VPEDIGREE
is designed for input here, but you can also define the inverse matrix explicitly. The second element of the pointer should then be a variate containing the non-zero values of the inverse in lower triangular order. The first element should be a factor, with number of levels equal to the number of rows n(n+1)/2 of the matrix. This has firstly a block of n values giving the position in the variate of the first value stored for each row. There is then a block of values for each row in turn, giving the columns in which each non-zero value appears.
When MODELTYPE=power
is used to define a distance-based model, the model can be of order 1 (isotropic) or 2 (anisotropic). For models with ORDER=1
, a single set of distances must be formed. The necessary information can be supplied using either the COORDINATES
option, or the COORDINATES
parameter, or the DISTANCES
parameter. With the COORDINATES
option you can specify either a matrix, or a list of variates, to define multi-dimensional coordinates for each unit of the data. The length of the variates, or the number of rows of the matrix, must be equal to the number of data values. The number of variates, or the number of columns of the matrix, is equal to the number of dimensions. The coordinates for the levels of each FACTOR
are then calculated as the mean values of the coordinates of the units included in the analysis with those levels. Alternatively, you can use the COORDINATES
parameter to specify a single variate, a pointer to several variates or a matrix to define multi-dimensional coordinates for each level of the FACTOR
. This parameter takes precedence over the COORDINATES
option. The length of the variates, or the number of rows of the matrix, must be equal to the number of levels of the FACTOR
. The number of variates, or the number of columns of the matrix, is again equal to the number of dimensions.
The distance calculation is defined by the METRIC
option. For levels i and j with n-dimensional coordinates {cik: k=1…n} and {cjk: k=1…n} the distance dij is defined as
dij = Σk |cik – cjk| | for METRIC=cityblock (the default); |
---|---|
dij = Σk (cik – cjk)2 | for METRIC=squared ; and |
dij = {Σk (cik – cjk)2}1/2 | for METRIC=euclidean . |
Finally, you can supply a symmetric matrix of pre-calculated distances, using the DISTANCES
parameter, and this takes precedence over the COORDINATES
parameter and option. The number of rows of the DISTANCES
matrix must be equal to the number of levels of the FACTOR
.
When MODELTYPE=power
and ORDER=2
, the DISTANCES
parameter cannot be used, and only two-dimensional coordinates are allowed. The coordinates must be specified using either the COORDINATES
option or parameter, as described above. The distances are calculated within each dimension separately, according to the setting of the METRIC
option. In this case the Euclidean and city-block distances are equivalent.
The spherical family of geostatistical models correspond to the MODELTYPE
settings boundedlinear
(for one-dimensional distances), circular
(for one or two dimensions) and spherical
(for one or two dimensions). For further details, see Webster & Oliver (2007). These models are based on distances, and require coordinates to be supplied using either the COORDINATES
option (to give coordinates for each data value), or the COORDINATES
parameter (to give coordinates for each factor level), as described for MODELTYPE=power
above. The parameter φ is interpreted as the range at which the correlation is considered to have decayed to zero. A small value therefore indicates weak correlation, and a large value indicates stronger correlation. These models do not have continuous second derivatives, and their log-likelihood may be multi-modal. To detect this potential problem, it is therefore important to start their estimation from several different initial values; this can be done using the INITIAL
parameter as described above. To ensure that the estimated correlation matrix differs from the identity matrix, it is necessary for the range parameter to be larger than the minimum distance specified by the coordinates; any initial value smaller than this will be adjusted.
The setting MODELTYPE=linearvariance
specifies the linear variance model of Williams (1986), extended by Piepho & Williams (2010). This model is parameterized so that the parameter φ lies in the range [0,1], which allows correlations in the range [-1,1]. Values of φ close to one indicate weak correlation and values close to zero indicate strong correlation between neighbouring observations.
The CORRELATE
option allows you to specify correlations between model terms which have equal numbers of effects. A common correlation will then be fitted between parallel effects. For example, consider a random coefficient regression model where the fixed model contains common response to covariate X
and the random model allows for deviations in the intercept and slope about this line for each subject. The random intercept and slope for each subject may be correlated, but subjects are independent. This correlation across terms is defined using the CORRELATE
option as follows:
VCOMPONENTS [FIXED=X] RANDOM=SUBJECT+SUBJECT.X
VSTRUCTURE [SUBJECT+SUBJECT.X; CORRELATE=positivedefinite;\
CINITIAL=!(1,0.1,0.3); FORM=whole]
The CORRELATE
option setting positivedefinite
is used to ensure that the correlation matrix between the terms remains positive definite. This constraint can be relaxed using the setting unrestricted
(an unstructured covariance matrix is then used to describe covariance across the terms). The model fitting is done here in terms of a covariance matrix, where the diagonal elements are the gammas for the correlated terms. The CINITIAL
option can be set to a variate or symmetric matrix to provide initial values for this matrix. If no initial values are given, the initial values are taken from initial gamma values given in VCOMPONENTS
when the model is declared. When correlations are declared between terms, you must set FORMATION=whole
. In the random coefficient regression model above, no correlation structure is declared within terms since the subjects are independent. However, it is possible to declare correlation/covariance models within terms as usual. For example, an animal breeding model might use VPEDIGREE
to set up covariances within terms as follows:
VPEDIGREE animal; FEMALE=dam; MALE=sire; INVERSE=Ainv
VCOMPONENTS [FIXED=Trt] RANDOM=animal+dam+env
VSTRUCTURE [animal+dam; CORRELATE=unr; FORM=whole]\
MODELTYPE=fixed; INVERSE=Ainv
These declarations set up random terms with covariance structures of the form:
cov(animal
) = σa2 A, cov(dam
) = σd2 A, cov(animal
, dam
) = σad A.
Direct Products
Although the direct product construction used to build the covariance structures does not generally constrain the models that can be fitted to any data set, you should be aware of the implications that arise when defining covariance structures for the residual term. The REML algorithm used by Genstat detects the presence of the residual term in the model by searching for terms with number of levels equal to the number of data values, n. When no covariance structures are specified, the first term with number of levels > n is used as the residual. However, when covariance structures are defined, the form of the variance model is
V = σ2 ( Σj γjZjGjZj′ + R )
where matrix R corresponds to the residual term and has n rows. For this reason, any term found with > n rows will not be used as the residual if it has a covariance matrix. If no valid residual term is found, a residual term will automatically be added to the model. This may result in an extra error term being fitted unintentionally. An example where this may happen is in repeated measurements data where unequal numbers of measurements have been taken on subjects. If direct product construction is used, the matrix generated will have more rows than the data and cannot be used as R. A work-around is to put missing values in the data set to give equal replication and use REML
option MVINCLUDE=yvariate
to retain the missing values in the analysis. Alternatively, you could fix the residual component at a small value.
Note that in the repeated measurements example above, if measurements are taken at different times for each subject, the direct product structure is not appropriate. In this case, a power model may be fitted over the whole term, constraining the between subject correlation to zero:
VSTRUCTURE [TERM=Subject.Week; FORM=whole;\
COORD=subject,week] MODELTYPE=power; ORDER=2;\
INITIAL=!(0,0.1); CONSTRAIN=!T(Fix,None)
Note that the parameters run in the order of the coordinates vectors (which are variate forms of the model factors).
Options: TERMS
, FORMATION
, CORRELATE
, CINITIAL
, COORDINATES
.
Parameters: MODELTYPE
, ORDER
, HETEROGENEITY
, METRIC
, FACTOR
, MATRIX
, INVERSE
, DISTANCES
, COORDINATES
, INITIAL
, CONSTRAINTS
, EQUALITYCONSTRAINTS
.
References
Piepho, H.P. & Williams, E.R. (2010). Linear variance models for plant breeding trials. Plant Breeding, 129, 1-8.
Webster, R. & Oliver, M.A. (2007). Geostatistics for Environmental Scientists, 2nd edition. Wiley, Chichester.
Williams, E.R. (1986). A neighbour model for field experiments. Biometrika, 73, 279-87.
See also
Directives: REML
, VCOMPONENTS
, VRESIDUAL
, VSTATUS
.
Procedure: VFSTRUCTURE
, VNEARESTNEIGHBOUR
.
Commands for: REML analysis of linear mixed models, Repeated measurements.
Example
" Examples 2:5.4.3a-e, 2:5.4.5a-b, 2:5.7a-c " " Repeated measurements analysis: growth of 14 plants measured after 1,3,5,7,10 weeks." FACTOR Plant & [LABELS=!T(HC,MAV)] Treatment OPEN '%GENDIR%/Examples/GuidePart2/Plant.dat'; CHANNEL=2 READ [CHANNEL=2] Plant,Treatment,Time,Height;\ FREPRESENTATION=levels,labels,levels,levels CLOSE 2 GROUPS Time; FACTOR=Week DREPMEASURES [GROUPS=Plant; TIMEPOINTS=Week] Height " Anova split-plot analysis." VCOMPONENTS [FIXED=Treatment*Week] Plant/Week REML Height " Equivalent analysis using uniform correlation structure" VCOMPONENTS [FIXED=Treatment*Week] Plant.Week VSTRUCTURE [TERM=Plant.Week] MODELTYPE=uniform; FACTOR=Week REML [PRINT=model,components] Height " Power model: - correlation decreases as time between measurements increase - takes account of unequally spaced measurements - co-ordinates must be specified as a list of variates or a matrix." VCOMPONENTS [FIXED=Treatment*Week] Plant.Week VSTRUCTURE [TERM=Plant.Week; COORDINATES=Time] MODELTYPE=power; FACTOR=Week REML [PRINT=model,components,wald,deviance] Height " Heterogeneous power model - correlations follow power model, variance allowed to change over time." VSTRUCTURE [TERM=Plant.Week; COORDINATES=Time] MODELTYPE=power;\ ORDER=1; FACTOR=Week; HETEROGENEITY=outside REML [PRINT=model,components,wald,deviance] Height " Save fitted covariance model as initial values for unstructured model." VKEEP TERM=Plant.Week; COVARIANCEMODEL=Covpower PRINT Covpower['Week'] " Unstructured model." VSTRUCTURE [TERM=Plant.Week] MODELTYPE=unstructured; FACTOR=Week;\ INITIAL=Covpower['Week'] REML [PRINT=model,components,wald,deviance] Height " Alternatively, generate initial values using residuals generated after fitting with no variance model." VCOMPONENTS [FIXED=Treatment*Week] REML [PRINT=*] Height; RESIDUALS=r " Residuals are in order plants within weeks, so matrix rows correspond to weeks and columns to plants." MATRIX [ROWS=5; COLUMNS=14; VALUES=#r] mres SYMMETRIC [ROWS=5] vcov CALCULATE vcov = mres *+ TRANSPOSE(mres) " Dividing by number of replicates within each week gives an easy but conservative estimate as it takes no account of treatment d.f." CALCULATE vcov = vcov / 14 " Unstructured model from new initial values." VCOMPONENTS [FIXED=Treatment*Week] Plant.Week VSTRUCTURE [TERM=Plant.Week] MODELTYPE=unstructured; FACTOR=Week;\ INITIAL=!(#vcov) REML [PRINT=deviance] Height " Ante-dependence model order 1 - also requires initial values." VSTRUCTURE [TERM=Plant.Week] antedependence; FACTOR=Week;\ INITIAL=Covpower['Week']; ORDER=1 REML [PRINT=model,components,deviance] Height " Ante-dependence model order 2 - use initial values from power model again ." VSTRUCTURE [TERM=Plant.Week] antedependence; FACTOR=Week; INITIAL=Covpower['Week']; ORDER=2 REML [PRINT=deviance] Height " Random coefficient regression Growth of 14 plants measured after 1,3,5,7,10 weeks. Profiles suggest use of quadratic functions: fit random intercept and slope for plants - no correlation." CALCULATE Timesqrd = Time * Time VCOMPONENTS [FIXED=Treatment*(Time+Timesqrd)] Plant+Plant.Time REML [PRINT=model,components,deviance] Height " Fit random intercept and slope (with correlation) for plants, using previous estimates as initial values." VCOMPONENTS [FIXED=Treatment*(Time+Timesqrd)] Plant+Plant.Time VSTRUCTURE [TERMS=Plant+Plant.Time; FORMATION=whole;\ CORRELATE=unrestricted; CINITIAL=!(3,0.1,0.1)] REML [PRINT=#,deviance] Height " Save random coefficient regression matrix for use as initial values " VKEEP TERM=Plant; COVARIANCEMODEL=CovRCR PRINT CovRCR['Across terms'] " Random cubic spline models: growth of 14 plants measured after 1,3,5,7,10 weeks. Baseline model without spline terms." VCOMPONENTS [FIXED=Treatment*Time] Plant+Plant.Time VSTRUCTURE [TERMS=Plant+Plant.Time; FORMATION=whole;\ CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']] REML [PRINT=components,deviance] Height " Include a random cubic spline term over time." VCOMPONENTS [FIXED=Treatment*Time; SPLINE=Time] Plant+Plant.Time VSTRUCTURE [TERMS=Plant+Plant.Time; FORMATION=whole;\ CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']] REML [PRINT=model,components,deviance] Height " Plot the spline term." VKEEP Time; SPLBLUP=Tblup; SPLDESIGN=Tdes; SPLX=Tknot CALCULATE Tspline = Tdes[1] *+ Tblup[1] YAXIS 1; LOWER=-50; UPPER=50 PEN 1,2; METHOD=line DGRAPH [TITLE='Common spline effect over time'] Tspline; Tknot[1] " Fit separate splines for each treatment." VCOMPONENTS [FIXED=Treatment*Time; SPLINE=Treatment.Time]\ Plant+Plant.Time VSTRUCTURE [TERMS=Plant+Plant.Time; FORMATION=whole;\ CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']] REML [PRINT=components,deviance] Height " Plot the splines." VKEEP [RMETHOD=all; RESIDUALS=R1] VKEEP [RMETHOD=notspline; RESIDUALS=R2] CALCULATE TTspline = R1 - R2 DGRAPH [TITLE='Separate treatment splines'] TTspline; Time;\ PEN=Treatment