1. Home
  2. VSTRUCTURE directive

VSTRUCTURE directive

Defines a variance structure for random effects in a REML model.


TERMS = formula Model terms for which the covariance structure is to be defined
FORMATION = string token Whether the structure is formed by direct product construction or by definition of the whole matrix (direct, whole); default dire
CORRELATE = string token Whether to impose correlation across the model terms if several are specified (none, positivedefinite, unrestricted); default none
CINITIAL = variate or symmetric matrix Initial values for covariance matrix across terms
COORDINATES = matrix or variates Coordinates of the data points to be used in calculating distance-based models


MODELTYPE = string tokens Type of covariance model associated with the term(s), or with individual factors in the term(s) if FORMATION=direct (identity, fixed, AR, MA, ARMA, power, boundedlinear, circular, spherical, linearvariance, banded, correlation, antedependence, unstructured, diagonal, uniform, FA, FAequal) default iden
ORDER = scalar Order of model
HETEROGENEITY = string token Heterogeneity for correlation matrices (none, outside); default none
METRIC = string token How to calculate distances when MODELTYPE=power (cityblock, squared, euclidean); default city
FACTOR = factors Factors over which to form direct products
MATRIX = symmetric matrices, diagonal matrices or pointers Defines matrix values for a term or the factors when MODELTYPE=fixed
INVERSE = symmetric matrices, diagonal matrices or pointers Define values for matrix inverses (instead of the fixed matrices themselves) when MODELTYPE=fixed
DISTANCES = symmetric matrices Symmetric matrix of pre-formed distances to be used in distance-based models of order one
COORDINATES = matrices, variates or pointers Specifies coordinates of each factor level to be used in calculating distance-based models
INITIAL = scalars, variates, matrices, symmetric matrices or pointers Initial parameter values for each correlation matrix (supplied in the structures appropriate for the model concerned)
CONSTRAINTS = texts Texts containing strings none, fix or positive to define constraints for the parameters in each model
EQUALITYCONSTRAINTS = variates Non-zero values in the variate indicate groups of parameters whose values are to be constrained to be equal


VSTRUCTURE can be used to define the form of covariance structure for any term in the random model defined for REML by VCOMPONENTS. By default, the effects for each random term are assumed to be independent with common variance σj2 for term j, that is, the random term has covariance matrix σj2I. VSTRUCTURE is used to define correlation between random effects within terms, to allow a changing variance within a term, and to define correlations between different random terms. These models are particularly useful when fitting linear models to repeated measurements or spatial data and for random coefficient regression.

VSTRUCTURE can only be used after VCOMPONENTS has been used to define the fixed and random models. It can be used more than once to define different structures for different random terms. The information is accumulated within Genstat, and it will all be used by subsequent REML commands. You can check on the model and covariance structures defined at any time by using the VSTATUS directive. To cancel a covariance structure for a term you simply need to use VSTRUCTURE to change the model back to the scaled identity matrix σj2I. To cancel all covariance structures you can give a new VCOMPONENTS command.

For a random term constructed from more than one factor, the covariance matrix can be formed either as a single matrix for the whole term, or as the direct product of several matrices corresponding to the factors. Consider an analysis of repeated measurements where data has been taken weekly from each subject, and one of several different treatments has been applied to each subject. It is likely that data taken from the same subject will be correlated, with correlation decreasing over time, but that subjects will be independent. This corresponds to an IC covariance structure, where the identity matrix I corresponds to the independent subjects, and the covariance matrix C corresponds to the correlated measurements over time within subjects. If we take C to be an auto-regressive process of order 1, this can be defined and fitted as follows:





The TERM option is used to specify the term to which the covariance structure is to be applied. For each factor in the term you can then specify the covariance model to be applied (see below for list of available models). However, it is not necessary to specify factors for which the default identity model is required, so the following is an equivalent specification:



To cancel the covariance structure for the term, a null setting is sufficient:


Although the covariance structure for each term here is of the form Gj = I, the variance matrix for the data is of the form

V = σ2 ( Σj γjZjGjZj′ + I )

In this case the random subject term generates correlations that are equal across all the times within subjects. It is important to remember that including a random term in the model will generate uniform correlations between units with the same values of the random factor(s). Retaining these terms in the model as well as specifying a correlated structure may be appropriate for some data sets, but can sometimes lead to difficulties in parameter estimation.

The possible settings for the MODELTYPE parameter, generating symmetric covariance matrices C (Ci, j = Cj, i for all i, j), are listed below. Where more than one model order can be used, the default is shown in bold and can be changed by using the ORDER option. For the AR, MA, ARMA, power and banded models, the order is the same as the number of parameters to be fitted. For the banded, correlation, ante-dependence and unstructured models, the order is the number of non-zero off-diagonal bands in the matrix. For the FAequal and FA models, the order is the number of columns in the matrix Λ.

identity identity matrix Ci, i = 1, Ci,j = 0, for ij
fixed fixed matrix Ci, j specified

auto-regressive order 1 or 2
2=0 for order 1)

Ci, i = 1
Ci+1, i = φ1 / (1-φ2)
Ci, j = φ1 Ci1, j + φ2 Ci2, j,
i > j+1, -1 < φ1, φ2 < 1,
12|<1, φ21<1, φ2>-1


moving average order 1 or 2
2=0 for order 1)

Ci, i = 1
Ci+1, i = -θ1(1-θ2)/(1+θ1222)
Ci+2, i = -θ2 / (1+θ1222)
Ci, j = 0, i>j+2
-1 < θ1, θ2 < 1, θ21 < 1


auto-regressive moving-average order 1

Ci, i = 1
Ci+1, i = (θ-φ)(1-φθ)/(1+θ2-2φθ)
Ci, j = φCi1, j , i>j+1
-1 < φ, θ < 1


based on distance order 1 or 2
1 = φ2 for order 1)

Ci, i = 1
Ci, j = φ1d1φ2d2
d1, d2 = distance in 1st and 2nd dimensions
0 < φ1, φ2 < 1

boundedlinear based on distance order 1

Ci, j = 1 – d/φ for d ≤ φ,
Ci, j = 0 for d > φ
0 < φ

circular based on distance order 1

Ci, j = 1 – (2/π) {(d/φ)√(1-(d/φ)2) + sin-1(d/φ)} for d ≤ φ,
Ci, j = 0 for d > φ
0 < φ

spherical based on distance order 1

Ci, j = 1 – 1.5 (d/φ) + 0.5 (d/φ)3 for d  ≤ φ,
Ci, j = 0 for d > φ
0 < φ

linearvariance based on distance order 1

Ci, j = 1 – 2φ d / max(d)
0 < φ < 1


equal bands
1 < order < nrows-1

Ci, i = 1
Ci+k, i = θk , 1 < k < order
-1 < θk < 1
Ci+k, i = 0, otherwise


general correlation matrix
1 < order < nrows-1

Ci, i = 1
Ci, j = θij ,
1 < |ij| ≤ order
Ci, j = 0, |ij| > order
-1 < θij < 1

uniform uniform matrix Ci, j = θ for all i,j
diagonal diagonal matrix

Ci, i = θi
Ci, j = 0, ij


ante-dependence model
1 < order < nrows-1

C-1 = UD-1U
Di, i-1 = di-1,
Di, j = 0 for ij
Ui, i = 1,
Ui, j = uij ,
1 ≤ ji ≤ order
Ui, j = 0, for i>j


general covariance matrix
1<order< nrows-1 

Ci, j = θij ,
0 < |ij| ≤ order
Ci, j = 0, |ij| > order


factor analytic
order = 1 or 2

C = ΛΛ′ + Ψ
Λ is an nrows × q matrix
Ψi = ψi for i=1…nrows


factor analytic with common variance
order = 1 or 2

C = ΛΛ′ + Ψ
Λ is an nrows × q matrix
Ψi = ψ for i=1…nrows

Initial parameter values can be specified using the INITIAL parameter. For most models, the number of initial values required is the number of parameters, and default values will be generated. However, for unstructured models, a full covariance matrix of initial values must be given, and for the correlation model a full correlation matrix must be provided. For the ante-dependence model, either a full covariance matrix can be provided, or a pointer to a U and a D-1 matrix of the correct forms. For the FA and FAequal models, a pointer must be used to give the initial Λ and Ψ matrices, otherwise default initial values are generated. The FAequal model can be used to get initial values for the FA model. Initial values are required for these models because the algorithm may not converge when many parameters are fitted if the starting values are not realistic. Initial values might be generated from covariance matrices estimated by fitting simpler models, or from residuals from a null variance model.

A missing value in the initial values is taken to mean that the value is inestimable and it will be fixed at a small value for the analysis. Alternatively, a parameter can be fixed at its initial value using the CONSTRAINTS parameter. The codes (not case sensitive and able to be abbreviated) may take value fix to indicate the parameter is to be fixed at its initial value, positive to indicate it is to remain positive or none to indicate no constraints. The default is a positive constraint or no constraint depending on context; for example scaling parameters are always constrained to remain positive. The EQUALITYCONSTRAINTS parameter allows you to constrain some of the parameters to have the same value. The variate that it specifies contains a zero value if there is no constraint, and an identical integer value for any set of parameters whose values are to be equal. So, a variate containing the values (0,1,2,1,2) would constrain the second parameter to be equal to the fourth parameter, and the third parameter to be equal to the fifth parameter.

It may sometimes be desirable to allow for unequal variances for the models defined in terms of correlation matrices: that is, for the AR, MA, ARMA, uniform, power, boundedlinear, circular, spherical, linearvariance, banded and correlation models. This can be done by setting option HETEROGENEITY=outside. This means a diagonal matrix D of standard errors will be applied to the correlation matrix C to generate a matrix D½CD½. In this case, a number of extra parameters (equal to the number of effects in the factor or term) should be added to the vector of initial values. These models allow investigation of a structured correlation pattern for changing variances and are particularly useful in the analysis of repeated measurements data when variance increases over time. For example, to allow for changing variance over time in our example above, we can specify





In some circumstances, you may wish to define a single model to apply to the whole term, instead of using the direct product form illustrated above. In this case, you should set option FORM=whole. Note that, when a term consists of a single factor, it is not necessary to set the FACTOR option.

With MODELTYPE=fixed, you must either use the MATRIX option to specify the values of the covariance matrix C, or the INVERSE option to specify the inverse matrix. The values of the matrix or its inverse can be supplied as diagonal matrices or symmetric matrices. In addition, values for the inverse matrix can be supplied in sparse form as a pointer. The output from VPEDIGREE is designed for input here, but you can also define the inverse matrix explicitly. The second element of the pointer should then be a variate containing the non-zero values of the inverse in lower triangular order. The first element should be a factor, with number of levels equal to the number of rows n(n+1)/2 of the matrix. This has firstly a block of n values giving the position in the variate of the first value stored for each row. There is then a block of values for each row in turn, giving the columns in which each non-zero value appears.

When MODELTYPE=power is used to define a distance-based model, the model can be of order 1 (isotropic) or 2 (anisotropic). For models with ORDER=1, a single set of distances must be formed. The necessary information can be supplied using either the COORDINATES option, or the COORDINATES parameter, or the DISTANCES parameter. With the COORDINATES option you can specify either a matrix, or a list of variates, to define multi-dimensional coordinates for each unit of the data. The length of the variates, or the number of rows of the matrix, must be equal to the number of data values. The number of variates, or the number of columns of the matrix, is equal to the number of dimensions. The coordinates for the levels of each FACTOR are then calculated as the mean values of the coordinates of the units included in the analysis with those levels. Alternatively, you can use the COORDINATES parameter to specify a single variate, a pointer to several variates or a matrix to define multi-dimensional coordinates for each level of the FACTOR. This parameter takes precedence over the COORDINATES option. The length of the variates, or the number of rows of the matrix, must be equal to the number of levels of the FACTOR. The number of variates, or the number of columns of the matrix, is again equal to the number of dimensions.

The distance calculation is defined by the METRIC option. For levels i and j with n-dimensional coordinates {cik: k=1…n} and {cjk: k=1…n} the distance dij is defined as

    dij = Σk |cikcjk| for METRIC=cityblock (the default);
    dij = Σk (cikcjk)2 for METRIC=squared; and
    dij = {Σk (cikcjk)2}1/2 for METRIC=euclidean.

Finally, you can supply a symmetric matrix of pre-calculated distances, using the DISTANCES parameter, and this takes precedence over the COORDINATES parameter and option. The number of rows of the DISTANCES matrix must be equal to the number of levels of the FACTOR.

When MODELTYPE=power and ORDER=2, the DISTANCES parameter cannot be used, and only two-dimensional coordinates are allowed. The coordinates must be specified using either the COORDINATES option or parameter, as described above. The distances are calculated within each dimension separately, according to the setting of the METRIC option. In this case the Euclidean and city-block distances are equivalent.

The spherical family of geostatistical models correspond to the MODELTYPE settings boundedlinear (for one-dimensional distances), circular (for one or two dimensions) and spherical (for one or two dimensions). For further details, see Webster & Oliver (2007). These models are based on distances, and require coordinates to be supplied using either the COORDINATES option (to give coordinates for each data value), or the COORDINATES parameter (to give coordinates for each factor level), as described for MODELTYPE=power above. The parameter φ is interpreted as the range at which the correlation is considered to have decayed to zero. A small value therefore indicates weak correlation, and a large value indicates stronger correlation. These models do not have continuous second derivatives, and their log-likelihood may be multi-modal. To detect this potential problem, it is therefore important to start their estimation from several different initial values; this can be done using the INITIAL parameter as described above. To ensure that the estimated correlation matrix differs from the identity matrix, it is necessary for the range parameter to be larger than the minimum distance specified by the coordinates; any initial value smaller than this will be adjusted.

The setting MODELTYPE=linearvariance specifies the linear variance model of Williams (1986), extended by Piepho & Williams (2010). This model is parameterized so that the parameter φ lies in the range [0,1], which allows correlations in the range [-1,1]. Values of φ close to one indicate weak correlation and values close to zero indicate strong correlation between neighbouring observations.

The CORRELATE option allows you to specify correlations between model terms which have equal numbers of effects. A common correlation will then be fitted between parallel effects. For example, consider a random coefficient regression model where the fixed model contains common response to covariate X and the random model allows for deviations in the intercept and slope about this line for each subject. The random intercept and slope for each subject may be correlated, but subjects are independent. This correlation across terms is defined using the CORRELATE option as follows:



  CINITIAL=!(1,0.1,0.3); FORM=whole]

The CORRELATE option setting positivedefinite is used to ensure that the correlation matrix between the terms remains positive definite. This constraint can be relaxed using the setting unrestricted (an unstructured covariance matrix is then used to describe covariance across the terms). The model fitting is done here in terms of a covariance matrix, where the diagonal elements are the gammas for the correlated terms. The CINITIAL option can be set to a variate or symmetric matrix to provide initial values for this matrix. If no initial values are given, the initial values are taken from initial gamma values given in VCOMPONENTS when the model is declared. When correlations are declared between terms, you must set FORMATION=whole. In the random coefficient regression model above, no correlation structure is declared within terms since the subjects are independent. However, it is possible to declare correlation/covariance models within terms as usual. For example, an animal breeding model might use VPEDIGREE to set up covariances within terms as follows:



VSTRUCTURE [animal+dam; CORRELATE=unr; FORM=whole]\


These declarations set up random terms with covariance structures of the form:

cov(animal) = σa2 A, cov(dam) = σd2 A, cov(animal, dam) = σad A.

Direct Products

Although the direct product construction used to build the covariance structures does not generally constrain the models that can be fitted to any data set, you should be aware of the implications that arise when defining covariance structures for the residual term. The REML algorithm used by Genstat detects the presence of the residual term in the model by searching for terms with number of levels equal to the number of data values, n. When no covariance structures are specified, the first term with number of levels > n is used as the residual. However, when covariance structures are defined, the form of the variance model is

V = σ2 ( Σj γjZjGjZj′ + R )

where matrix R corresponds to the residual term and has n rows. For this reason, any term found with > n rows will not be used as the residual if it has a covariance matrix. If no valid residual term is found, a residual term will automatically be added to the model. This may result in an extra error term being fitted unintentionally. An example where this may happen is in repeated measurements data where unequal numbers of measurements have been taken on subjects. If direct product construction is used, the matrix generated will have more rows than the data and cannot be used as R. A work-around is to put missing values in the data set to give equal replication and use REML option MVINCLUDE=yvariate to retain the missing values in the analysis. Alternatively, you could fix the residual component at a small value.

Note that in the repeated measurements example above, if measurements are taken at different times for each subject, the direct product structure is not appropriate. In this case, a power model may be fitted over the whole term, constraining the between subject correlation to zero:

VSTRUCTURE [TERM=Subject.Week; FORM=whole;\

  COORD=subject,week] MODELTYPE=power; ORDER=2;\

  INITIAL=!(0,0.1); CONSTRAIN=!T(Fix,None)

Note that the parameters run in the order of the coordinates vectors (which are variate forms of the model factors).




Piepho, H.P. & Williams, E.R. (2010). Linear variance models for plant breeding trials. Plant Breeding, 129, 1-8.

Webster, R. & Oliver, M.A. (2007). Geostatistics for Environmental Scientists, 2nd edition. Wiley, Chichester.

Williams, E.R. (1986). A neighbour model for field experiments. Biometrika, 73, 279-87.

See also



Commands for: REML analysis of linear mixed models, Repeated measurements.


" Examples 2:5.4.3a-e, 2:5.4.5a-b, 2:5.7a-c "
" Repeated measurements analysis:
  growth of 14 plants measured after 1,3,5,7,10 weeks."
FACTOR       Plant
&            [LABELS=!T(HC,MAV)] Treatment
OPEN         '%GENDIR%/Examples/GuidePart2/Plant.dat'; CHANNEL=2
READ         [CHANNEL=2] Plant,Treatment,Time,Height;\ 
CLOSE        2
GROUPS       Time; FACTOR=Week
" Anova split-plot analysis."
VCOMPONENTS [FIXED=Treatment*Week] Plant/Week
REML        Height
" Equivalent analysis using uniform correlation structure"
VCOMPONENTS [FIXED=Treatment*Week] Plant.Week
REML        [PRINT=model,components] Height
" Power model:
  - correlation decreases as time between measurements increase
  - takes account of unequally spaced measurements
  - co-ordinates must be specified as a list of variates or a matrix."
VCOMPONENTS [FIXED=Treatment*Week] Plant.Week
REML        [PRINT=model,components,wald,deviance] Height
" Heterogeneous power model - correlations follow power model,
  variance allowed to change over time."
            ORDER=1; FACTOR=Week; HETEROGENEITY=outside
REML        [PRINT=model,components,wald,deviance] Height
" Save fitted covariance model as initial values for unstructured model."
PRINT       Covpower['Week']
" Unstructured model."
VSTRUCTURE  [TERM=Plant.Week] MODELTYPE=unstructured; FACTOR=Week;\ 
REML        [PRINT=model,components,wald,deviance] Height
" Alternatively, generate initial values using residuals generated after
  fitting with no variance model."
REML        [PRINT=*] Height; RESIDUALS=r
" Residuals are in order plants within weeks, so matrix rows correspond
  to weeks and columns to plants."
MATRIX      [ROWS=5; COLUMNS=14; VALUES=#r] mres
CALCULATE   vcov = mres *+ TRANSPOSE(mres)
" Dividing by number of replicates within each week gives an easy but
  conservative estimate as it takes no account of treatment d.f."
CALCULATE   vcov = vcov / 14
" Unstructured model from new initial values."
VCOMPONENTS [FIXED=Treatment*Week] Plant.Week
VSTRUCTURE  [TERM=Plant.Week] MODELTYPE=unstructured; FACTOR=Week;\ 
REML        [PRINT=deviance] Height
" Ante-dependence model order 1 - also requires initial values."
VSTRUCTURE  [TERM=Plant.Week] antedependence; FACTOR=Week;\ 
            INITIAL=Covpower['Week']; ORDER=1
REML        [PRINT=model,components,deviance] Height
" Ante-dependence model order 2
  - use initial values from power model again ."
VSTRUCTURE  [TERM=Plant.Week] antedependence; FACTOR=Week; 
            INITIAL=Covpower['Week']; ORDER=2
REML        [PRINT=deviance] Height
" Random coefficient regression
  Growth of 14 plants measured after 1,3,5,7,10 weeks.
  Profiles suggest use of quadratic functions:
  fit random intercept and slope for plants - no correlation."
CALCULATE   Timesqrd = Time * Time
VCOMPONENTS [FIXED=Treatment*(Time+Timesqrd)] Plant+Plant.Time
REML        [PRINT=model,components,deviance] Height
" Fit random intercept and slope (with correlation) for plants,
  using previous estimates as initial values."
VCOMPONENTS [FIXED=Treatment*(Time+Timesqrd)] Plant+Plant.Time
VSTRUCTURE  [TERMS=Plant+Plant.Time; FORMATION=whole;\ 
            CORRELATE=unrestricted; CINITIAL=!(3,0.1,0.1)]
REML        [PRINT=#,deviance] Height
" Save random coefficient regression matrix for use as initial values "
PRINT       CovRCR['Across terms']
" Random cubic spline models:
  growth of 14 plants measured after 1,3,5,7,10 weeks.
  Baseline model without spline terms."
VCOMPONENTS [FIXED=Treatment*Time] Plant+Plant.Time
VSTRUCTURE  [TERMS=Plant+Plant.Time; FORMATION=whole;\ 
            CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']]
REML        [PRINT=components,deviance] Height
" Include a random cubic spline term over time."
VCOMPONENTS [FIXED=Treatment*Time; SPLINE=Time] Plant+Plant.Time
VSTRUCTURE  [TERMS=Plant+Plant.Time; FORMATION=whole;\ 
            CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']]
REML        [PRINT=model,components,deviance] Height
" Plot the spline term."
VKEEP       Time; SPLBLUP=Tblup; SPLDESIGN=Tdes; SPLX=Tknot
CALCULATE   Tspline = Tdes[1] *+ Tblup[1]
YAXIS       1; LOWER=-50; UPPER=50
PEN         1,2; METHOD=line
DGRAPH      [TITLE='Common spline effect over time'] Tspline; Tknot[1]
" Fit separate splines for each treatment."
VCOMPONENTS [FIXED=Treatment*Time; SPLINE=Treatment.Time]\ 
VSTRUCTURE  [TERMS=Plant+Plant.Time; FORMATION=whole;\ 
            CORRELATE=unrestricted; CINITIAL=CovRCR['Across terms']]
REML        [PRINT=components,deviance] Height
" Plot the splines."
VKEEP       [RMETHOD=notspline; RESIDUALS=R2]
CALCULATE   TTspline = R1 - R2
DGRAPH      [TITLE='Separate treatment splines'] TTspline; Time;\ 
Updated on March 27, 2024

Was this article helpful?