EXTRABINOMIAL procedure

Fits the models of Williams (1982) to overdispersed proportions (M.S. Ridout & P.W. Goedhart).

Options

`PRINT` = string tokens	What to print if iterative estimation process converges successfully and whether to monitor the iterations (`model`, `summary`, `accumulated`, `estimates`, `correlations`, `fittedvalues`, `monitoring`); default `*`
`CONSTANT` = string token	How to treat constant (`estimate`, `omit`); default `esti`
`FACTORIAL` = scalar	Limit for expansion of model terms; default 3
`NOMESSAGE` = string tokens	Which warning messages to suppress (`dispersion`, `leverage`, `residual`, `aliasing`, `marginality`); default `*`
`METHOD` = string token	Which model to fit to take account of the extra variation (`II`, `III`); default `II`
`MODIFYMODEL` = string token	Whether to leave the modified `MODEL` settings (`WEIGHTS` and `DISPERSION`) or whether to restore the original situation (`yes`, `no`); default `no`
`WEIGHTS` = variate	To save estimated weights
`PHI` = scalar	To save estimated overdispersion parameter
`MAXCYCLE` = scalar	Maximum number of iterations; default 10
`TOLERANCE` = scalar	Convergence criterion; default 0.01

Parameter

`TERMS` = formula	Model terms to be fitted; if unset it is assumed that the model consists only of a constant term

Description

In binomial regression models, residual variability is often larger than would be expected if the data were indeed binomially distributed. This may be due to a few outliers or a poor choice of link function but often it simply indicates that the data are from a distribution more variable than the binomial. Such data are said to be “overdispersed” or to exhibit “extra-binomial variation”.

Williams (1982) discusses two possible models to extend the usual binomial model (Model I). Model II assumes that the true variance exceeds the binomial variance by a factor

V = 1 + (NBINOMIAL-1) × φ (0 ≤ φ ≤ 1)

If the overdispersion parameter PHI were known, the data could be analysed using a binomial model with prior weights 1/V. Procedure EXTRABINOMIAL estimates φ so that the residual chi-square statistic from this weighted analysis is (approximately) equal to the residual degrees of freedom (Moore 1987). If the binomial totals are all equal, Method II is equivalent to setting the DISPERSION option of MODEL equal to the residual chi-square statistic divided by its degrees of freedom.

Alternatively, Model III assumes that the linear predictor varies about its expectation with a constant variance. Usually this variation is assumed to follow a normal distribution; if there is then a logit link, the error distribution will be a logistic normal. Extensions to Model III to have several normal distributions contributing to the variation on the linear predictor, similar to those that occur in stratified analysis of variance, form the basis of many methods suggested for analysing generalized linear mixed models. For Model III, there is generally no simple expression for the exact variance. But the delta method can be used to show that, approximately, the variance exceeds the binomial variance by a factor

V = 1 + (NBINOMIAL-1) × φ × F² / (P × (1 – P))

where φ is variance on the scale of the linear predictor, P is the fitted probability and F is the derivative of the inverse of the link function, evaluated at the fitted value of the linear predictor.

Before using EXTRABINOMIAL a MODEL statement must be given, in the usual way, to define the y-variate, the binomial totals, the link and any offset. The error distribution must also of course be set to binomial but any settings of WEIGHTS or DISPERSION are ignored.

The form of EXTRABINOMIAL is similar in many ways to the FIT directive. There is a single parameter TERMS to define the model terms to be fitted, and the first four options, PRINT, CONSTANT, FACTORIAL, and NOMESSAGE, all have the same syntax and purpose as in FIT. The remaining options are specific to EXTRABINOMIAL.

The METHOD option selects which model to use (II or III); by default METHOD=II. Both models involve the estimation of the weight variate (1/V) required to fit the model using the standard Genstat facilities for generalized linear models. If option MODIFYMODEL=yes, EXTRABINOMIAL will leave the MODEL statement in its modified form (provided the iterative estimation of φ converges), with the WEIGHTS option set to these weights and the DISPERSION option set to 1, so that directives like DROP can be used to study the effects of individual terms in the model in the usual way. The TERMS directive will also be left set to the model specified by the TERMS parameter of EXTRABINOMIAL, and this model will be the one most recently fitted, so further output can be obtained using RDISPLAY.

Options WEIGHTS and PHI allow the weights and the estimated value of φ, respectively, to be saved. The MAXCYCLE option specifies the maximum number of iterations in the estimation, and the TOLERANCE option defines the convergence criterion:

ABS(Chi-square – Residual d.f.) < TOLERANCE × Residual d.f.

Options: PRINT, CONSTANT, FACTORIAL, NOMESSAGE, METHOD, MODIFYMODEL, WEIGHTS, PHI, MAXCYCLE, TOLERANCE.

Parameter: TERMS.

Method

If the binomial totals are all equal, φ is determined (non-iteratively) from the residual chi-square statistic.

Otherwise, φ must be found iteratively and the method used (Williams, 1982) involves nested iterations. Each outer iteration (involving a model fit) requires an inner iteration (which uses only CALCULATE statements) to get the updated estimate of φ. The option MAXCYCLE controls the maximum number of outer iterations. The maximum number of inner iterations is fixed at 10.

Very precise convergence is not important in practice; the default setting of the TOLERANCE option ( 1% ) should give a perfectly adequate estimate of φ, usually within 3 iterations.

Action with `RESTRICT`

Any of the following structures may be restricted: the Y variate; the NBINOMIAL variate; the WEIGHTS variate; the OFFSET variate; any variate or factor appearing in the model formula. Restrictions on different structures must be compatible. Restricted units are excluded from the analysis.

References

Moore, D.F. (1987). Modelling the extraneous variance in the presence of extra-binomial variation. Applied Statistics, 36, 8-14.

Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statistics, 31, 144-148.

Example

CAPTION  'EXTRABIN example',\
         !t('A 2 x 2 factorial experiment comparing germination',\
         'of two types of seed and two root extracts (Crowder, M.J.,',\
         '1978, Appl. Statist., 27, 34-37).'); STYLE=meta,plain
FACTOR   [LABELS=!T(O_75,O_73); VALUES=1,10(1,2)] Seed
FACTOR   [LABELS=!T(Bean,Cucumber); VALUES=5(1,2),2,5(1,2)] RtExtrct
VARIATE  NGerm,NSeeds ;\ 
  VALUES=!(10,23,23,26,17,5,53,55,32,46,10,8,10,8,23,0,3,22,15,32,3),\ 
         !(39,62,81,51,39,6,74,72,51,79,13,16,30,28,45,4,12,41,30,51,7)
MODEL    [DISTRIBUTION=binomial; LINK=logit] NGerm; NBINOMIAL=NSeeds
EXTRABIN [PRINT=estimates; PHI=Phi] Seed*RtExtrct
PRINT    Phi

Updated on March 8, 2019

Was this article helpful?

Yes No