Fits a variance-components model by residual (or restricted) maximum likelihood.
|What output to present (
||Terms (fixed or random) for which effects or means are to be printed; default
||Standard errors to be printed with tables of effects and means (
||Weights for the analysis; default
||Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (
||Defines a submodel of the fixed model to be assessed against the full model (for
||Whether to reuse the results from the estimation when printing or assessing a submodel (
||Which random terms to use when calculating
||Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (
||Limit on the number of iterations; default 30|
||Tolerances for matrix inversion; default
||Parameterization to use for the variance component estimation (
||Whether printed output for covariance models gives the variance matrices or the parameters (
||Controls whether and how to calculate F-statistics for fixed terms (
||Number of blocks of internal memory to be set up for use by the algorithm|
||Variates to be analysed|
||Residuals from each analysis|
||Fitted values from each analysis|
||Exit status of the fit (0 if successful)|
||Saves the details of each analysis for use in subsequent
REML estimates the treatment effects and variance components in a linear mixed model: that is, a linear model with both fixed and random effects. The model to be fitted is specified using the
VCOMPONENTS directive, covariance models for the random effects can be defined using the
VSTRUCTURE directive, and the
VRESIDUAL directive can define covariance models for the residual term or specify the residual term for the individual experiments in a meta-analysis. Further output can be produced following
VDISPLAY, and output can be saved in Genstat data structures using
REML can be used in situations where you would normally use
ANOVA but have unbalanced or correlated data, or where you would normally use linear regression, but have more than one source of variation or correlation in the data.
REML can be used to analyse data from a wide variety of applications. It can obtain information on sources and sizes of variability in data sets. This can be of interest where the relative size of different sources of variability must be assessed, for example to identify the least reliable stages in an industrial process, or to design more effective experiments.
REML also provides efficient estimates of treatment effects in unbalanced designs with more than one source of error. It can be used to provide estimates of treatment effects that combine information from all the strata of a partially balanced design, or to combine information over similar experiments conducted at different times or in different places. You can thus obtain estimates that make use of the information from all the experiments, as well as the separate estimates from each individual experiment. Examples from several different areas of application can be found in Robinson (1987). The facilities for estimation of covariance models allow estimates of treatment effects and standard errors to be obtained using an appropriate variance model and taking account of the correlation structure of the data.
The method of residual maximum likelihood (REML) was introduced by Patterson & Thompson (1971). It was developed in order to avoid the biased variance component estimates that are produced by ordinary maximum likelihood estimation: because maximum likelihood estimates of variance components take no account of the degrees of freedom used in estimating treatment effects, they have a downwards bias which increases with the number of fixed effects in the model. This in turn leads to under-estimates of standard errors for fixed effects, which may lead to incorrect inferences being drawn from the data. Estimates of variance parameters which take account of the degrees of freedom used in estimating fixed effects, like those generated by
ANOVA in balanced data sets, are more desirable.
Once a mixed model has been specified (using
VCOMPONENTS) and any covariance structures have been defined (using
VRESIDUAL) you can fit the model to the data (the y-variates) using the
Y parameter lists the variates that are to be modelled. For example, given appropriate factor definitions, the following command sets up a model and analyses the data held in variate
VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot
REML Yield; FITTED=Fit; RESIDUALS=Res
RESIDUALS parameters allow you to store the fitted values and residuals from the fitted model – above they are stored in variates
EXIT parameter saves the “exit status” of each analysis. This is set to zero if it was completed successfully; for details of the other codes, see
SAVE parameter can be used to name the
REML save structure for use with later
The default setting of
PRINT=model,components,Wald,cova, gives a description of the model and covariance models that have been fitted, plus estimates of the variance components and the Wald tests. By default if tables of means and effects are requested, tables for all terms in the fixed model are given together with a summary of the standard error of differences between effects/means. Options
PSE can be used to change the terms or obtain different types of standard error. For example,
VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot
REML [PRINT=means; PTERMS=Nitrogen.Variety;\
means that a
Variety table of predicted means will be produced with a standard error for each cell.
FMETHOD option controls whether to accompany the Wald tests for fixed effects with approximate F statistics and corresponding numbers of residual degrees of freedom. The computations, using the method devised by Kenward & Roger (1997), can be time consuming with large or complicated models. So, with the default setting
FMETHOD=automatic, Genstat assesses the model itself and decides automatically whether to do the computations and which method to use. The other settings allow you to control what to do yourself:
||no F statistics are produced;|
||F statistics are calculated using algebraic derivatives (which may involve large matrix calculations);|
||F statistics are calculated using numerical derivatives (which require an extra evaluation of the mixed model equations for every variance parameter).|
CFORMAT option controls the type of output produced for the estimated covariance models. The default setting,
variancematrices, produces the variance-covariance matrices for the components, whereas the setting
parameters prints their parameters.
MVINCLUDE option allows the inclusion of units with missing values. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting
explanatory allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting
yvariate includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see
VSTRUCTURE), or to produce predictions of data values for given values of explanatory factors and/or variates.
WEIGHTS option can be used to specify a weight for each unit in the analysis. This is useful when it is suspected that the size of the random error varies between units. For example, if the random error for unit i is known to have variance viσ2, a weight variate should be used containing values wi=1/vi.
RMETHOD option controls the way in which residuals and fitted values are formed. For the default setting
RMETHOD=final, the fitted values are calculated from all the fixed and random effects. The residuals are the difference between the data and the fitted values and, in this case, are estimates of the
*units* random error and can be used to check the Normality and variance homogeneity assumptions for the random error. To get fitted values constructed from the fixed terms alone, omitting all random terms, the setting
RMETHOD=all must be used. The setting
RMETHOD=notspline means that the residuals will be formed from all the random effects, excluding spline terms.
SUBMODEL is used to specify a submodel of the fixed model (but only applies when
METHOD=Fisher). This model will be fitted as well as the full fixed model, using a slightly modified version of the algorithm, and the difference in deviances between the full and submodel can be used as a likelihood-based test to assess the importance of the fixed terms dropped from the full model, as described by Welham & Thompson (1997). Once the full model has been fitted, the
RECYCLE option can be used to test a series of submodels of the fixed model. If option
RECYCLE=yes is set, then only the estimation for the submodel is performed. Information for the full fixed model is picked up from the corresponding save structure. When the
RECYCLE option is set, only the
model settings of
METHOD option specifies whether to use the AI (Average Information) algorithm (Gilmour et al. 1995) with sparse matrix methods to maximize the residual likelihood, or Fisher scoring with full matrix manipulation. By default, the sparse Average Information algorithm is used. The AI algorithm generally runs faster per iteration than Fisher scoring and uses much less workspace, but it may require slightly more iterations to reach convergence. When sparse matrix methods are used, standard errors of differences will not be available for random effects, although standard errors are available. Note that when
RECYCLE options do not apply. The
WORKSPACE option (default 1) specifies the number of blocks of internal memory to be allocated for use by the estimation algorithm when
MAXCYCLE can be used to change the maximum number of iterations performed by the algorithm from the default of 30.
TOLERANCES option gives tolerances for three matrix inversions. The first two values are matrix inversion tolerances for the information matrix and the mixed model equations respectively and take the value 10-5 by default. The third value is used to detect zero frequency counts for factor combinations in the mixed model equations: 10-6 is used by default.
Any of the y-variates or any of the factors or variates in the fixed and random models (defined by
VCOMPONENTS) may be restricted to indicate that only a subset of the units is to be used in the analysis. However, if more than one of these vectors is restricted, all must be restricted to the same set of units. Any restrictions on the variates supplied to save residuals or fitted values are ignored.
Gilmour, A.R., Thompson, R. & Cullis, B. (1995). AIREML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440-1450.
Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997.
Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-554.
Robinson, D.L. (1987). Estimation and use of variance components. The Statistician, 36, 3-14.
Welham, S.J. & Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. Journal of the Royal Statistical Society, Series B, 59, 701-714.
" Example REML-1: Incomplete block analysis (from Cochran & Cox, Table 10.3, p. 406) " FILEREAD [NAME='%gendir%/examples/REML-1.DAT']\ Reps,Blocks,Treats,Yield; FGROUPS=3(yes),no VCOMPONENTS [FIXED=Treats] RANDOM=Reps/Blocks CALCULATE Log = LOG(Yield) REML [PRINT=components,effects,means,stratum,vcov,monitor; PSE=diff;\ PTERMS=Reps/Blocks; METHOD=Fisher; MAXCYCLE=2] Yield,Log; SAVE=Syield,Slog VAIC " Plot residuals for each analysis" VPLOT [GRAPHICS=high; SAVE=Syield] fitted VPLOT [GRAPHICS=high; SAVE=Slog] fitted