Fits a generalized linear mixed model (S.J. Welham).

### Options

`PRINT` = string token |
What output to display (`model` , `components` , `effects` , `fittedvalues` , `means` , `backmeans` , `monitoring` , `vcovariance` , `waldtests` , `missing values` , `covariancemodels` , `deviance` ); default `mode` , `comp` , `effe` , `mean` , `back` , `moni` , `vcov` , `cova` |
---|---|

`DISTRIBUTION` = string token |
Error distribution (`binomial` , `poisson` , `normal` , `gamma` , `negativebinomial` ); default `bino` |

`LINK` = string token |
Link function (`identity` , `logarithm` , `logit` , `reciprocal` , `probit` , `complementaryloglog` , `logratio` ); default `*` gives the canonical link |

`DISPERSION` = scalar |
Value at which to fix the residual variance, if missing the variance is estimated; default 1 for binomial, Poisson and negative binomial distributions, a missing value otherwise |

`RANDOM` = formula |
Random model excluding bottom stratum; this must be set |

`FIXED` = formula |
Fixed model; default `*` |

`ABSORB` = factor |
Absorbing factor to be used at the `REML` step of the iterations |

`CONSTANT` = string token |
Whether to estimate or omit constant term in fixed model (`omit` , `estimate` ); default `esti` |

`FACTORIAL` = scalar |
Limit on number of factors/covariates in a model term; default 3 |

`PTERMS` = formula |
Formula specifying fixed terms for which means or back-transformed means are to be printed; default `*` prints all the fixed model terms |

`PSE` = string token |
Standard errors to print with tables of means (se, `sesummary` , `sed` , `sedsummary` , `vcovariance` , `differences` , `estimates` , `alldifferences` , `allestimates` ); default `seds` |

`MVINCLUDE` = string tokens |
Whether to include units with missing values in the explanatory factors and variates and/or the y-variates (`explanatory` , `yvariate` ); default `*` i.e. omit units with missing values in either explanatory factors or variates or y-variates |

`MAXCYCLE` = scalar |
Maximum number of iterations of the `GLMM` algorithm; default 20 |

`TOLERANCE` = scalar |
Convergence criterion for iterative procedure; default 0.0001 |

`FMETHODGLMM` = string token |
Specifies fitting method (`all` , `fixed` ): `all` indicates the method of Schall (1991); `fixed` indicates the marginal method of Breslow & Clayton (1993) ; default `all` |

`OFFSET` = variate |
Variate holding values to be used as an offset on the linear predictor scale; default `*` |

`CADJUST` = string token |
What adjustment to make to covariates for the `REML` analysis (`mean` , `none` ); default `mean` |

`AGGREGATION` = scalar |
Fixed parameter for negative binomial distribution (parameter k as in variance function var = mean + mean^{2}/k); default 1 |

`KLOGRATIO` = scalar |
Parameter k for logratio link, in form log(mean / (mean + k)); default as set in `AGGREGATION` option |

`OWNDIST` = text |
For non-standard distributions only: text specifying the variance function to be used with dummy variable `DUM` , e.g. `OWNDIST='DUM'` |

`OWNLINK` = text |
For non-standard link functions only: text specifying 3 functions using dummy variable `DUM` – the link function, its inverse and its derivative, e.g. `OWNLINK = !T('log(DUM)','exp(DUM)','1/DUM')` |

`CDEFINITIONS` = text |
Statements to execute to define correlation models; default `*` i.e. none |

`CVECTORS` = pointer |
Data structures involved in the correlation models |

`WORKSPACE` = scalar |
Number of blocks of internal memory to be set up for use by the `REML` algorithm; default 1 |

`VCONSTRAINTS` = string token |
Whether to constrain variance components to be positive (`none` , `positive` ); default `posi` |

`VMETHOD` = string token |
Indicates whether to use the standard Fisher-scoring algorithm or the new AI algorithm with sparse matrix methods (`Fisher` , `AI)` ; default `AI` |

`VMAXCYCLE` = scalar |
Limit on the number of iterations; default 30 |

### Parameters

`Y` = variates |
Dependent variates |
---|---|

`NBINOMIAL` = scalars or variates |
Number of binomial trials for each unit (must be set if `DISTRIBUTION=binomial` ) |

`FITTEDVALUES` = variates |
Variates to save fitted values |

`COMPONENTS` = variates |
Variate to save estimated variance components |

`VCOVARIANCE` = symmetric matrices |
Variance-covariance matrix for the variance components |

`MEANS` = pointers |
Pointer to save tables of means for each `Y` variate |

`VARMEANS` = pointers |
Pointer to save covariance matrices of tables of means for each `Y` variate |

`BACKMEANS` = pointers |
Pointer to save tables of back-transformed means for each `Y` variate |

`ITERATIVEWEIGHTS` = variates |
Saves the iterative weights from the generalized linear model fitting |

`INITIALFITTEDVALUES` = variates |
Defines initial values for the fitted values; if unset, these are formed automatically |

`EXIT` = scalar |
Exit status for the fit of the `GLMM` (0 if successful) |

`SAVE` = `REML` save structures |
Saves details of the `REML` analysis used to fit the model |

`GLSAVE` = pointer |
Saves details of the `GLMM` analysis |

### Description

Procedure `GLMM`

estimates the parameters of a generalized linear mixed model using either the method of Schall (1991) or the marginal method of Breslow & Clayton (1993), as described in the Methods Section.

The procedure assumes a generalized linear mixed model, that is a generalized linear model with both fixed and Normally-distributed random effects on the scale of the linear predictor. The procedure estimates the fixed effects together with the variance components associated with the random effects.

The `DISTRIBUTION`

option sets the error distribution; the default is to assume a binomial distribution but the Poisson, gamma and negative-binomial distributions are also available. Other distributions can be used via the `OWNDIST`

option; this should be set to a text containing the formula for calculating the variance function for the required distribution, in terms of dummy variable `DUM`

. The link can be set using the `LINK`

option; the default takes the canonical link. Identity, logarithm, logit, reciprocal, probit, complementaryloglog or logratio link functions are also provided, and alternative link functions can be used via the `OWNLINK`

option. In this case, `OWNLINK`

must be set to a text with three values containing formulae (in terms of dummy variable `DUM`

) for calculating the link function, its inverse and its first derivative. For example, instead of specifying a Poisson distribution with log link, the `OWNDIST`

and `OWNLINK`

options could be set as

`OWNDIST='DUM'; OWNLINK=!T(LOG(DUM),EXP(DUM),'1/DUM')`

Where necessary, these expressions should be constructed so that invalid results (eg. divide by zero or log(zero)) are avoided.

The `AGGREGATION`

option supplies the aggregation parameter for the negative-binomial distribution; default 1. The `KLOGRATIO`

option supplies the parameter *k* to be used in the logratio link, and takes its default from `AGGREGATION`

.

The `DISPERSION`

option specifies the dispersion parameter. The default is 1 for binomial, Poisson and negative binomial distributions, a missing value otherwise (indicating that the dispersion parameter is to be estimated).

The fixed and random models are specified by the `FIXED`

and `RANDOM`

options. The number of factors in the terms of the fixed model can be limited using the `FACTORIAL`

option. By default the variance components are constrained to be positive, but you can set option `VCONSTRAINTS`

to `none`

to allow them to become negative.

The `VMETHOD`

option specifies the algorithm to use in the `REML`

steps of the `GLMM`

algorithm: either `Fisher`

or `AI`

(default).

The `ABSORB`

option can specify an absorbing factor for use with the `Fisher`

algorithm. However, if the absorbing factor appears in any of the terms of the `FIXED`

model, no estimates of error will be available for these terms (see the *Guide to the Genstat Command Language*, Part 2, Sections 5.3.3 and 5.3.7). The `VMAXCYCLE`

option controls the number of iterations used by the `REML`

algorithm.

By default, a constant term is included in the model; this can be suppressed by setting option `CONSTANT=omit`

. An offset can be included in the linear predictor by setting option `OFFSET`

. By default any covariates are centred for the `REML`

fitting by subtracting their means, weighted according to the iterative weights of the generalized linear model. Alternatively you can set option `CADJUST=none`

to request that the uncentred covariates are used instead. You can save the iterative weights using the `ITERATIVEWEIGHTS`

parameter.

It is also possible to define correlation models on the random terms, although the results should be used with caution as their properties are not yet well understood. To do this, you should set the `CDEFINITIONS`

option to a text containing the Genstat statements required to define the models (e.g. using `VSTRUCTURE`

). You also need to set the `CVECTORS`

option to a pointer containing the data structures involved in the statements. Then, in the statements themselves, you should refer to each of these as `CVECTORS[n]`

, where `n`

is the position of the relevant data structure in the pointer. For example:

`TEXT cdef; VALUE=\`

`'VSTRUCTURE [CVECTORS[1].CVECTORS[2]] ar,ar; FACTOR=CVECTORS[1,2]; ORDER=1'`

`GLMM [DISTRIBUTION=gamma; LINK=log; FIXED=variety;\`

`RANDOM=fieldrow*fieldcolumn; CDEFINITION=cdef;\`

`CVECTORS=!p(fieldrow,fieldcolumn)] yield`

The `MVINCLUDE`

option allows the inclusion of units with missing values, as in the `REML`

directive. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting `explanatory`

allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting `yvariate`

includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see `VSTRUCTURE`

), or to produce predictions of data values for given values of explanatory factors and/or variates.

The `FMETHODGLMM`

option specifies the method used to form the fitted values and therefore determines the fitting method to be used. The default setting `all`

specifies that both fixed and random terms should be used to form fitted values which gives the method of Schall (1991); setting `fixed`

indicates that only fixed terms are used to form fitted values which gives the marginal method of Breslow & Clayton (1993).

The `PRINT`

option selects the output to be displayed:

`model`

description of model fitted

`components`

estimates of variance components, and estimated parameters of covariance models,

`effects`

estimates of parameters α and β, the fixed and random effects,

`fittedvalues`

table containing the y-variate, fitted values, residuals on the natural scale and standardized residuals on the scale of the linear predictor,

`means`

predicted means for factor combinations,

`backmeans`

back-transformed means,

`monitoring`

monitoring information at each iteration,

`vcovariance`

variance-covariance matrix of the estimated components,

`waldtests`

Wald tests for fixed terms,

`missingvalue`

estimates of missing values,

`covariancemodels`

estimated covariance models and

`deviance`

deviance from the generalized linear model.

The default is `PRINT=mode`

, `comp`

, `effe`

, `mean`

, `back`

, `moni`

, `vcov`

, `cova`

.

The deviance represents the variation remaining after fitting the fixed terms and all the random terms. It thus assesses how well those terms explain the random variation in the data.

To avoid problems with 0 and 100% observations, the standardized residuals on the linear-predictor scale are calculated as differences between the adjusted dependent variate and the fitted values on that scale (and then standardized by their standard errors). The fitted values include the random as well as the fixed terms. The `GLDISPLAY`

procedure can print residuals and fitted values where the fitted values are calculated only from the fixed terms.

The `PTERMS`

option can specify which tables of means are printed; by default, tables of means are produced for all the terms in the fixed model.

The `PSE`

option controls the standard errors that are printed with tables of means and effects:

`se`

standard errors,

`sesummary`

summary of the standard errors (default),

`sed`

standard errors of differences between pairs of means,

`sedsummary`

summary of the standard errors of differences,

`vcovariance`

variance-covariance matrix for the means,

`allestimates`

synonym of `se`

,

`estimates`

synonym of `sesummary`

,

`alldifferences`

synonym of `sed`

,

`differences`

synonym of `sedsummary`

.

Some control over the iterative `GLMM`

algorithm is provided by option `MAXCYCLE`

which sets the maximum number of iterations (default 20), and by option `TOLERANCE`

which specifies the criterion for determining convergence of the algorithm (default 0.0001). Convergence is judged to have been attained once the maximum change in the ratio (variance component)/(residual variance) and the change in the residual variance are less than the specified `TOLERANCE`

.

The dependent variate is specified using the `Y`

parameter. The `NBINOMIAL`

parameter must be set when `DISTRIBUTION=binomial`

to specify the total number of trials on each unit, as a variate if the number varies from unit to unit or as a scalar if it is constant over all the units.

The other parameters are used to save results. The variance components and residual variance can be saved in a variate using parameter `VCOMPONENTS`

, with their variance-covariance matrix stored in a symmetric matrix specified by parameter `VCOVARIANCE`

. The tables of means to be saved are determined by the setting of `PTERMS`

. The tables are stored in a pointer specified by parameter `MEANS`

, in the order in which they appear in the `FIXED`

model. Their variance matrices and tables of back-transformed means are stored similarly in pointers specified by parameters `VARMEANS`

and `BACKMEANS`

.

The `EXIT`

parameter saves a scalar indicating the exit status for the fit of the `GLMM`

(0 if successful, 1 otherwise).

You can display further output from the analysis using the `GLDISPLAY`

procedure, and use the `GLKEEP`

procedure to save information in Genstat data structures. The `GLPREDICT`

procedure can form predictions. You can use the `GLRTEST`

procedure to assess the random model, and the `GLPERMTEST`

procedure to do permutation tests to assess the fixed model. By default these procedures take the most recent `GLMM`

analysis, but you can use the `GLSAVE`

to save the results of the analysis, to use instead in future calls of these procedures.

Alternatively `VDISPLAY`

and `VKEEP`

can be used to redisplay or store other results from the internal `REML`

estimation provided `REML`

has not been used in the interim. You can use the `SAVE`

parameter to save the `REML`

save structure and use that as input to these directives if `REML`

may be used for another analysis in the interim.

Options: `PRINT`

, `DISTRIBUTION`

, `LINK`

, `DISPERSION`

, `RANDOM`

, `FIXED`

, `ABSORB`

, `CONSTANT`

, `FACTORIAL`

, `PTERMS`

, `PSE`

, `MVINCLUDE`

, `MAXCYCLE`

, `TOLERANCE`

, `FMETHODGLMM`

, `OFFSET`

, `CADJUST`

, `AGGREGATION`

, `KLOGRATIO`

, `OWNDIST`

, `OWNLINK`

, `CDEFINITIONS`

, `CVECTORS`

, `WORKSPACE`

, `VCONSTRAINTS`

, `VMETHOD`

, `VMAXCYCLE`

.

Parameters: `Y`

, `NBINOMIAL`

, `FITTEDVALUES`

, `COMPONENTS`

, `VCOVARIANCE`

, `MEANS`

, `VARMEANS`

, `BACKMEANS`

, `ITERATIVEWEIGHTS`

, `INITIALFITTEDVALUES`

, `EXIT`

, `SAVE`

, `GLSAVE`

.

### Method

`GLMM`

estimates the parameters of the Generalized Linear Mixed Model using either the method of Schall (1991) or the marginal method of Breslow & Clayton (1993). The method used is determined by the setting of option `FMETHODGLMM`

.

The data *y* arises from some specified distribution with variance function *sV* and expected value μ. The link function g (with inverse h) is such that

g(μ) = η = *X a* + *Z b*

where *X* is the design matrix for the vector *a* of fixed effects and *Z* is the design matrix for the vector *b* of random effects. The random effects *b* can be attributed to *c* random factors which are assumed to have zero mean and to be uncorrelated with each other and with *e*:

Cov(*b*) = *D* = Diag{ σ_{1}^{2} *I*_{1} … σ_{c}^{2} *I _{c}* }

The method used by Schall (1991) develops the algorithm by analogy with the algorithm for estimating conventional generalized linear models. The link function applied to the data is linearized about μ to give the adjusted dependent variate *z*,

*z* = *X a* + *Z b* + *e* g′(μ)

where e=*y*-μ and g′ = dg/dμ.

Then

E(*z*) = *X a*; Cov(*b*) = *D*;

Cov(*e* g′(μ)) = *sV*(μ) × (dη/dμ) × (dη/dμ) = *s* × W(μ)^{-1}

where *s* is the dispersion parameter. Hence

Cov(*z*) = *s* × W(μ)^{-1} + *Z D Z*′.

This has the same form as the general linear mixed model, and the fixed effects and variance components can be estimated by `REML`

with (iterative) weights *W*.

This leads to the following algorithm:

Step 1) Using initial estimates of the variance components and of μ, calculate the adjusted variate *z* and weights *W*.

Step 2) Get new estimates of the variance components and of μ by `REML`

on adjusted variate *z* with weights *W*.

Step 3) Convergence in estimates ⇒ exit algorithm.

Step 4) Use new estimates to update adjusted variate *z* and weights vector *W*.

Step 5) Go to Step 2.

The marginal model used by Breslow and Clayton is derived from a first order approximation (linearisation about *Xa*) to give

*y* ∼ h(*Xa*) + h′(*Xa*)*Zb* + *e*

where ∼ indicates approximation, h is the inverse of the link fuction g and *e* is *y*-μ. They then work in terms of the marginal mean, *M*=h(*Xa*). Quasi-likelihood estimation leads to an algorithm similar to the one above, but the working variate becomes

*z* = *Xa* + (*y*–*M*)g′(*M*) = *Xa* + *E*g′(*M*)

where *E*=*y*–*M*. The working variate *z* then has variance

Cov(*z*) = *s* × W(*M*)^{-1} + *Z D Z*′.

The same algorithm is used to fit the model, replacing μ by *M* and *e* by *E*.

The only difference between the two algorithms is then in the method used to form the mean μ or *M* and the “error” variate *e* or *E*. The option `RMETHOD`

of `REML`

controls the method of forming fitted values after `REML`

estimation (i.e. including just fixed terms, or all terms except the residual) and this option is used inside the procedure to determine which of the models is fitted.

Initial values for the variance components are calculated by `REML`

estimation using the fixed and random models on the data transformed by the link function. Initial values for the fixed effects are calculated by fitting the fixed model only to a generalized linear model with the specified link and error distribution. The `WORKSPACE`

option specifies the number of blocks of internal memory to be set up for use by the `REML`

algorithm; see the `REML`

directive for more details.

### Action with `RESTRICT`

If the Y-variate is restricted, only the units not excluded by the restriction will be analysed.

### References

Breslow, N.E. & Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. *Journal of the American Statistical Association*, 88, 421, 9-25.

McCullagh, P. & Nelder, J.A. (1989). *Generalized Linear Models (second edition)*. Chapman & Hall, London.

Schall, R. (1991) Estimation in generalized linear models with random effects. *Biometrika*, 78, 719-727.

### See also

Procedures: `GEE`

, `HGANALYSE`

, `GLDISPLAY`

, `GLKEEP`

, `GLPERMTEST`

, `GLPLOT`

, `GLPREDICT`

, `GLRTEST`

, `GLTOBITPOISSON`

.

Commands for: Regression analysis.

### Example

CAPTION 'GLMM example',\ !t('Data from McCullagh & Nelder (1989, Table 14.4),',\ 'also see Schall (1991).'); STYLE=meta,plain FACTOR [NVALUES=120; LEVELS=20] Female, Male & [LEVELS=4; LABELS=!t(RR,RW,WR,WW)] Cross VARIATE [NVALUES=120] Mate1 READ Cross,Male,Female; FREPRESENTATION=labels,2(levels) RR 1 1 RW 14 1 RR 5 1 RW 11 1 RR 4 1 RW 15 1 RR 5 2 RW 15 2 RR 3 2 RW 13 2 RR 1 2 RW 12 2 RR 2 3 RW 11 3 RR 1 3 RW 14 3 RR 3 3 RW 13 3 RR 4 4 RW 12 4 RR 2 4 RW 15 4 RR 5 4 RW 14 4 RR 3 5 RW 13 5 RR 4 5 RW 12 5 RR 2 5 RW 11 5 RW 19 6 RR 9 6 RW 20 6 RR 7 6 RW 16 6 RR 8 6 RW 18 7 RR 8 7 RW 19 7 RR 9 7 RW 17 7 RR 6 7 RW 16 8 RR 6 8 RW 17 8 RR 10 8 RW 20 8 RR 9 8 RW 20 9 RR 7 9 RW 18 9 RR 6 9 RW 19 9 RR 10 9 RW 17 10 RR 10 10 RW 16 10 RR 8 10 RW 18 10 RR 7 10 WR 9 11 WW 19 11 WR 7 11 WW 20 11 WR 10 11 WW 18 11 WR 7 12 WW 16 12 WR 9 12 WW 17 12 WR 6 12 WW 20 12 WR 8 13 WW 17 13 WR 6 13 WW 19 13 WR 7 13 WW 16 13 WR 10 14 WW 20 14 WR 8 14 WW 18 14 WR 9 14 WW 19 14 WR 6 15 WW 18 15 WR 10 15 WW 16 15 WR 8 15 WW 17 15 WW 15 16 WR 2 16 WW 13 16 WR 4 16 WW 12 16 WR 1 16 WW 14 17 WR 1 17 WW 15 17 WR 2 17 WW 11 17 WR 5 17 WW 11 18 WR 4 18 WW 12 18 WR 5 18 WW 15 18 WR 3 18 WW 13 19 WR 3 19 WW 11 19 WR 1 19 WW 14 19 WR 4 19 WW 12 20 WR 5 20 WW 14 20 WR 3 20 WW 13 20 WR 2 20: READ Mate1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 1 1 0 : GLMM [DISTRIBUTION=binomial; LINK=logit; FIXED=Cross; RANDOM=Female+Male]\ Mate1; NBINOMIAL=1