Fits the models of Williams (1982) to overdispersed proportions (M.S. Ridout & P.W. Goedhart).

### Options

`PRINT` = string tokens |
What to print if iterative estimation process converges successfully and whether to monitor the iterations (`model` , `summary` , `accumulated` , `estimates` , `correlations` , `fittedvalues` , `monitoring` ); default `*` |
---|---|

`CONSTANT` = string token |
How to treat constant (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit for expansion of model terms; default 3 |

`NOMESSAGE` = string tokens |
Which warning messages to suppress (`dispersion` , `leverage` , `residual` , `aliasing` , `marginality` ); default `*` |

`METHOD` = string token |
Which model to fit to take account of the extra variation (`II` , `III` ); default `II` |

`MODIFYMODEL` = string token |
Whether to leave the modified `MODEL` settings (`WEIGHTS` and `DISPERSION` ) or whether to restore the original situation (`yes` , `no` ); default `no` |

`WEIGHTS` = variate |
To save estimated weights |

`PHI` = scalar |
To save estimated overdispersion parameter |

`MAXCYCLE` = scalar |
Maximum number of iterations; default 10 |

`TOLERANCE` = scalar |
Convergence criterion; default 0.01 |

### Parameter

`TERMS` = formula |
Model terms to be fitted; if unset it is assumed that the model consists only of a constant term |
---|

### Description

In binomial regression models, residual variability is often larger than would be expected if the data were indeed binomially distributed. This may be due to a few outliers or a poor choice of link function but often it simply indicates that the data are from a distribution more variable than the binomial. Such data are said to be “overdispersed” or to exhibit “extra-binomial variation”.

Williams (1982) discusses two possible models to extend the usual binomial model (Model I). Model II assumes that the true variance exceeds the binomial variance by a factor

*V* = 1 + (`NBINOMIAL`

-1) × φ (0 ≤ φ ≤ 1)

If the overdispersion parameter *PHI* were known, the data could be analysed using a binomial model with prior weights 1/*V*. Procedure `EXTRABINOMIAL`

estimates φ so that the residual chi-square statistic from this weighted analysis is (approximately) equal to the residual degrees of freedom (Moore 1987). If the binomial totals are all equal, Method II is equivalent to setting the `DISPERSION`

option of `MODEL`

equal to the residual chi-square statistic divided by its degrees of freedom.

Alternatively, Model III assumes that the linear predictor varies about its expectation with a constant variance. Usually this variation is assumed to follow a normal distribution; if there is then a logit link, the error distribution will be a logistic normal. Extensions to Model III to have several normal distributions contributing to the variation on the linear predictor, similar to those that occur in stratified analysis of variance, form the basis of many methods suggested for analysing generalized linear mixed models. For Model III, there is generally no simple expression for the exact variance. But the delta method can be used to show that, approximately, the variance exceeds the binomial variance by a factor

*V* = 1 + (`NBINOMIAL`

-1) × φ × *F*^{2} / (*P* × (1 – *P*))

where φ is variance on the scale of the linear predictor, *P* is the fitted probability and *F* is the derivative of the inverse of the link function, evaluated at the fitted value of the linear predictor.

Before using `EXTRABINOMIAL`

a `MODEL`

statement must be given, in the usual way, to define the y-variate, the binomial totals, the link and any offset. The error distribution must also of course be set to `binomial`

but any settings of `WEIGHTS`

or `DISPERSION`

are ignored.

The form of `EXTRABINOMIAL`

is similar in many ways to the `FIT`

directive. There is a single parameter `TERMS`

to define the model terms to be fitted, and the first four options, `PRINT`

, `CONSTANT`

, `FACTORIAL`

, and `NOMESSAGE`

, all have the same syntax and purpose as in `FIT`

. The remaining options are specific to `EXTRABINOMIAL`

.

The `METHOD`

option selects which model to use (`II`

or `III`

); by default `METHOD=II`

. Both models involve the estimation of the weight variate (1/*V*) required to fit the model using the standard Genstat facilities for generalized linear models. If option `MODIFYMODEL=yes`

, `EXTRABINOMIAL`

will leave the `MODEL`

statement in its modified form (provided the iterative estimation of φ converges), with the `WEIGHTS`

option set to these weights and the `DISPERSION`

option set to 1, so that directives like `DROP`

can be used to study the effects of individual terms in the model in the usual way. The `TERMS`

directive will also be left set to the model specified by the `TERMS`

parameter of `EXTRABINOMIAL`

, and this model will be the one most recently fitted, so further output can be obtained using `RDISPLAY`

.

Options `WEIGHTS`

and `PHI`

allow the weights and the estimated value of φ, respectively, to be saved. The `MAXCYCLE`

option specifies the maximum number of iterations in the estimation, and the `TOLERANCE`

option defines the convergence criterion:

`ABS`

(Chi-square – Residual d.f.) < `TOLERANCE`

× Residual d.f.

Options: `PRINT`

, `CONSTANT`

, `FACTORIAL`

, `NOMESSAGE`

, `METHOD`

, `MODIFYMODEL`

, `WEIGHTS`

, `PHI`

, `MAXCYCLE`

, `TOLERANCE`

.

Parameter: `TERMS`

.

### Method

If the binomial totals are all equal, φ is determined (non-iteratively) from the residual chi-square statistic.

Otherwise, φ must be found iteratively and the method used (Williams, 1982) involves nested iterations. Each outer iteration (involving a model fit) requires an inner iteration (which uses only `CALCULATE`

statements) to get the updated estimate of φ. The option `MAXCYCLE`

controls the maximum number of outer iterations. The maximum number of inner iterations is fixed at 10.

Very precise convergence is not important in practice; the default setting of the `TOLERANCE`

option ( 1% ) should give a perfectly adequate estimate of φ, usually within 3 iterations.

### Action with `RESTRICT`

Any of the following structures may be restricted: the `Y`

variate; the `NBINOMIAL`

variate; the `WEIGHTS`

variate; the `OFFSET`

variate; any variate or factor appearing in the model formula. Restrictions on different structures must be compatible. Restricted units are excluded from the analysis.

### References

Moore, D.F. (1987). Modelling the extraneous variance in the presence of extra-binomial variation. *Applied Statistics*, 36, 8-14.

Williams, D.A. (1982). Extra-binomial variation in logistic linear models. *Applied Statistics*, 31, 144-148.

### See also

Procedures: `GLMM`

, `HGANALYSE`

, `RNEGBINOMIAL`

, `R0INFLATED`

.

Commands for: Regression analysis.

### Example

CAPTION 'EXTRABIN example',\ !t('A 2 x 2 factorial experiment comparing germination',\ 'of two types of seed and two root extracts (Crowder, M.J.,',\ '1978, Appl. Statist., 27, 34-37).'); STYLE=meta,plain FACTOR [LABELS=!T(O_75,O_73); VALUES=1,10(1,2)] Seed FACTOR [LABELS=!T(Bean,Cucumber); VALUES=5(1,2),2,5(1,2)] RtExtrct VARIATE NGerm,NSeeds ;\ VALUES=!(10,23,23,26,17,5,53,55,32,46,10,8,10,8,23,0,3,22,15,32,3),\ !(39,62,81,51,39,6,74,72,51,79,13,16,30,28,45,4,12,41,30,51,7) MODEL [DISTRIBUTION=binomial; LINK=logit] NGerm; NBINOMIAL=NSeeds EXTRABIN [PRINT=estimates; PHI=Phi] Seed*RtExtrct PRINT Phi