Fits a variance-components model by residual (or restricted) maximum likelihood.

### Options

`PRINT` = string tokens |
What output to present (`model` , `components` , `effects` , `means` , `stratumvariances` , `monitoring` , `vcovariance` , `deviance` , `Waldtests` , `missingvalues` , `covariancemodels` ); default `mode` , `comp` , `Wald` , `cova` |
---|---|

`PTERMS` = formula |
Terms (fixed or random) for which effects or means are to be printed; default `*` implies all the fixed terms |

`PSE` = string token |
Standard errors to be printed with tables of effects and means (`differences` , `estimates` , `alldifferences` , `allestimates` , `none` ); default `diff` |

`WEIGHTS` = variate |
Weights for the analysis; default `*` implies all weights 1 |

`MVINCLUDE` = string tokens |
Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (`explanatory` , `yvariate` ); default `*` i.e. omit units with missing values in either explanatory factors or variates or y-variates |

`SUBMODEL` = formula |
Defines a submodel of the fixed model to be assessed against the full model (for `METHOD=Fisher` only) |

`RECYCLE` = string token |
Whether to reuse the results from the estimation when printing or assessing a submodel (`yes, no` ); default `no` |

`RMETHOD` = string token |
Which random terms to use when calculating `RESIDUALS` (`final` , `all` , `notspline` ); default `fina` |

`METHOD` = string token |
Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (`Fisher` , `AI` ); default `AI` |

`MAXCYCLE` = scalar |
Limit on the number of iterations; default 30 |

`TOLERANCES` = variate |
Tolerances for matrix inversion; default `*` i.e. appropriate default values |

`PARAMETERIZATION` = string token |
Parameterization to use for the variance component estimation (`gammas` , `sigmas` ); default `*` i.e. use whichever is most appropriate for model |

`CFORMAT` = string token |
Whether printed output for covariance models gives the variance matrices or the parameters (`variancematrices` , `parameters` ); default `vari` |

`FMETHOD` = string token |
Controls whether and how to calculate F-statistics for fixed terms (`automatic` , `none` , `algebraic` , `numerical` ); default `auto` |

`WORKSPACE` = scalar |
Number of blocks of internal memory to be set up for use by the algorithm |

### Parameters

`Y` = variates |
Variates to be analysed |
---|---|

`RESIDUALS` = variates |
Residuals from each analysis |

`FITTEDVALUES` = variates |
Fitted values from each analysis |

`EXIT` = scalar |
Exit status of the fit (0 if successful) |

`SAVE` = REML save structures |
Saves the details of each analysis for use in subsequent `VDISPLAY` and `VKEEP` directives |

### Description

`REML`

estimates the treatment effects and variance components in a linear mixed model: that is, a linear model with both fixed and random effects. The model to be fitted is specified using the `VCOMPONENTS`

directive, covariance models for the random effects can be defined using the `VSTRUCTURE`

directive, and the `VRESIDUAL`

directive can define covariance models for the residual term or specify the residual term for the individual experiments in a meta-analysis. Further output can be produced following `REML`

using `VDISPLAY`

, and output can be saved in Genstat data structures using `VKEEP`

. `REML`

can be used in situations where you would normally use `ANOVA`

but have unbalanced or correlated data, or where you would normally use linear regression, but have more than one source of variation or correlation in the data.

`REML`

can be used to analyse data from a wide variety of applications. It can obtain information on sources and sizes of variability in data sets. This can be of interest where the relative size of different sources of variability must be assessed, for example to identify the least reliable stages in an industrial process, or to design more effective experiments. `REML`

also provides efficient estimates of treatment effects in unbalanced designs with more than one source of error. It can be used to provide estimates of treatment effects that combine information from all the strata of a partially balanced design, or to combine information over similar experiments conducted at different times or in different places. You can thus obtain estimates that make use of the information from all the experiments, as well as the separate estimates from each individual experiment. Examples from several different areas of application can be found in Robinson (1987). The facilities for estimation of covariance models allow estimates of treatment effects and standard errors to be obtained using an appropriate variance model and taking account of the correlation structure of the data.

The method of residual maximum likelihood (REML) was introduced by Patterson & Thompson (1971). It was developed in order to avoid the biased variance component estimates that are produced by ordinary maximum likelihood estimation: because maximum likelihood estimates of variance components take no account of the degrees of freedom used in estimating treatment effects, they have a downwards bias which increases with the number of fixed effects in the model. This in turn leads to under-estimates of standard errors for fixed effects, which may lead to incorrect inferences being drawn from the data. Estimates of variance parameters which take account of the degrees of freedom used in estimating fixed effects, like those generated by `ANOVA`

in balanced data sets, are more desirable.

Once a mixed model has been specified (using `VCOMPONENTS`

) and any covariance structures have been defined (using `VSTRUCTURE`

and `VRESIDUAL`

) you can fit the model to the data (the y-variates) using the `REML`

directive.

The `Y`

parameter lists the variates that are to be modelled. For example, given appropriate factor definitions, the following command sets up a model and analyses the data held in variate `Yield`

:

`VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot`

`REML Yield; FITTED=Fit; RESIDUALS=Res`

The `FITTEDVALUES`

and `RESIDUALS `

parameters allow you to store the fitted values and residuals from the fitted model – above they are stored in variates `Fit`

and `Res`

. The `EXIT`

parameter saves the “exit status” of each analysis. This is set to zero if it was completed successfully; for details of the other codes, see `VKEEP`

. The `SAVE`

parameter can be used to name the `REML`

save structure for use with later `VKEEP`

and `VDISPLAY`

directives.

The three options `PRINT`

, `PTERMS`

and `PSE`

all control the printed output. The `PRINT`

option selects the output to be displayed:

The default setting of `PRINT=model,components,Wald,cova`

, gives a description of the model and covariance models that have been fitted, plus estimates of the variance components and the Wald tests. By default if tables of means and effects are requested, tables for all terms in the fixed model are given together with a summary of the standard error of differences between effects/means. Options `PTERMS`

and `PSE`

can be used to change the terms or obtain different types of standard error. For example,

`VCOMPONENTS [FIXED=Nitrogen*Variety] RANDOM=Block/Wplot/Splot`

`REML [PRINT=means; PTERMS=Nitrogen.Variety;\`

` PSE=allestimates] Yield`

means that a `Nitrogen`

by `Variety`

table of predicted means will be produced with a standard error for each cell.

The `FMETHOD`

option controls whether to accompany the Wald tests for fixed effects with approximate F statistics and corresponding numbers of residual degrees of freedom. The computations, using the method devised by Kenward & Roger (1997), can be time consuming with large or complicated models. So, with the default setting `FMETHOD=automatic`

, Genstat assesses the model itself and decides automatically whether to do the computations and which method to use. The other settings allow you to control what to do yourself:

`none` |
no F statistics are produced; |
---|---|

`algebraic` |
F statistics are calculated using algebraic derivatives (which may involve large matrix calculations); |

`numerical` |
F statistics are calculated using numerical derivatives (which require an extra evaluation of the mixed model equations for every variance parameter). |

The `CFORMAT`

option controls the type of output produced for the estimated covariance models. The default setting, `variancematrices`

, produces the variance-covariance matrices for the components, whereas the setting `parameters`

prints their parameters.

The `MVINCLUDE`

option allows the inclusion of units with missing values. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting `explanatory`

allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting `yvariate`

includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see `VSTRUCTURE`

), or to produce predictions of data values for given values of explanatory factors and/or variates.

The `WEIGHTS`

option can be used to specify a weight for each unit in the analysis. This is useful when it is suspected that the size of the random error varies between units. For example, if the random error for unit *i* is known to have variance *v _{i}*σ

^{2}, a weight variate should be used containing values

*w*=1/

_{i}*v*.

_{i}The `RMETHOD`

option controls the way in which residuals and fitted values are formed. For the default setting `RMETHOD=final`

, the fitted values are calculated from all the fixed and random effects. The residuals are the difference between the data and the fitted values and, in this case, are estimates of the `*units*`

random error and can be used to check the Normality and variance homogeneity assumptions for the random error. To get fitted values constructed from the fixed terms alone, omitting all random terms, the setting `RMETHOD=all`

must be used. The setting `RMETHOD=notspline`

means that the residuals will be formed from all the random effects, excluding spline terms.

Option `SUBMODEL`

is used to specify a submodel of the fixed model (but only applies when `METHOD=Fisher`

). This model will be fitted as well as the full fixed model, using a slightly modified version of the algorithm, and the difference in deviances between the full and submodel can be used as a likelihood-based test to assess the importance of the fixed terms dropped from the full model, as described by Welham & Thompson (1997). Once the full model has been fitted, the `RECYCLE`

option can be used to test a series of submodels of the fixed model. If option `RECYCLE=yes`

is set, then only the estimation for the submodel is performed. Information for the full fixed model is picked up from the corresponding save structure. When the `RECYCLE`

option is set, only the `deviance`

and `model`

settings of `PRINT`

can be used.

The `METHOD`

option specifies whether to use the AI (Average Information) algorithm (Gilmour *et al*. 1995) with sparse matrix methods to maximize the residual likelihood, or Fisher scoring with full matrix manipulation. By default, the sparse Average Information algorithm is used. The AI algorithm generally runs faster per iteration than Fisher scoring and uses much less workspace, but it may require slightly more iterations to reach convergence. When sparse matrix methods are used, standard errors of differences will not be available for random effects, although standard errors are available. Note that when `METHOD=AI`

, the `SUBMODEL`

and `RECYCLE`

options do not apply. The `WORKSPACE`

option (default 1) specifies the number of blocks of internal memory to be allocated for use by the estimation algorithm when `METHOD=AI`

.

Option `MAXCYCLE`

can be used to change the maximum number of iterations performed by the algorithm from the default of 30.

The `TOLERANCES`

option gives tolerances for three matrix inversions. The first two values are matrix inversion tolerances for the information matrix and the mixed model equations respectively and take the value 10^{-5} by default. The third value is used to detect zero frequency counts for factor combinations in the mixed model equations: 10^{-6} is used by default.

Options: `PRINT`

, `PTERMS`

, `PSE`

, `WEIGHTS`

, `MVINCLUDE`

, `SUBMODEL`

, `RECYCLE`

, `RMETHOD`

, `METHOD`

, `MAXCYCLE`

, `TOLERANCES`

, `PARAMETERIZATION`

, `CFORMAT`

, `WORKSPACE`

.

Parameters: `Y`

, `RESIDUALS`

, `FITTEDVALUES`

, `EXIT`

, `SAVE`

.

### Action with `RESTRICT`

Any of the y-variates or any of the factors or variates in the fixed and random models (defined by `VCOMPONENTS`

) may be restricted to indicate that only a subset of the units is to be used in the analysis. However, if more than one of these vectors is restricted, all must be restricted to the same set of units. Any restrictions on the variates supplied to save residuals or fitted values are ignored.

### References

Gilmour, A.R., Thompson, R. & Cullis, B. (1995). AIREML, an efficient algorithm for variance parameter estimation in linear mixed models. *Biometrics*, 51, 1440-1450.

Kenward, M.G. & Roger, J.H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. *Biometrics*, 53, 983-997.

Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. *Biometrika*, 58, 545-554.

Robinson, D.L. (1987). Estimation and use of variance components. *The Statistician*, 36, 3-14.

Welham, S.J. & Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. *Journal of the Royal Statistical Society, Series B*, 59, 701-714.

### See also

Directives: `VCOMPONENTS`

, `VSTRUCTURE`

, `VRESIDUAL`

, `VDISPLAY`

, `VPREDICT`

, `VKEEP`

, `VPEDIGREE`

, `VCYCLE`

, `VSTATUS`

.

Procedures: `VAIC`

, `VALLSUBSETS`

, `VBOOTSTRAP`

, `VCHECK`

, `VCRITICAL`

, `VFRESIDUALS`

, `VGRAPH`

, `VPLOT`

, `VDEFFECTS`

, `VDFIELDRESIDUALS`

, `VFIXEDTESTS`

, `VFLC`

, `VFPEDIGREE`

, `VFUNCTION`

, `VHERITABILITY`

, `VLSD`

, `VMETA`

, `VMCOMPARISON`

, `VPERMTEST`

, `VPOWER`

, `VRACCUMULATE`

, `VRCHECK`

, `VRFIT`

, `VRMETAMODEL`

, `VRPERMTEST`

, `VSAMPLESIZE`

, `VSCREEN`

, `VSOM`

, `VSPREADSHEET`

, `VSURFACE`

, `VTCOMPARISONS`

, `VABLOCKDESIGN`

, `VAMETA`

, `VAROWCOLUMNDESIGN`

, `VASERIES`

, `VALINEBYTESTER`

, `VLINEBYTESTER`

, `FCONTRASTS`

, `FDIALLEL`

, `AOVANYHOW`

.

Commands for: REML analysis of linear mixed models, Analysis of variance.

### Example

" Example REML-1: Incomplete block analysis (from Cochran & Cox, Table 10.3, p. 406) " FILEREAD [NAME='%gendir%/examples/REML-1.DAT']\ Reps,Blocks,Treats,Yield; FGROUPS=3(yes),no VCOMPONENTS [FIXED=Treats] RANDOM=Reps/Blocks CALCULATE Log = LOG(Yield) REML [PRINT=components,effects,means,stratum,vcov,monitor; PSE=diff;\ PTERMS=Reps/Blocks; METHOD=Fisher; MAXCYCLE=2] Yield,Log; SAVE=Syield,Slog VAIC " Plot residuals for each analysis" VPLOT [GRAPHICS=high; SAVE=Syield] fitted VPLOT [GRAPHICS=high; SAVE=Slog] fitted