Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).

### Options

`PRINT` = string tokens |
Printed output from the analysis (`model` , `deviance` , `summary` , `estimates` , `correlations` , `fittedvalues` , `accumulated` , `monitoring` , `confidence` , `aggregation` , `loglikelihood` ); default `mode` , `summ` , `esti` , `aggr` |
---|---|

`AGGREGATION` = scalar |
Saves the estimate of the aggregation parameter |

`_2LOGLIKELIHOOD` = scalar |
Saves the value of -2 × log-likelihood |

`CONSTANT` = string token |
How to treat the constant (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit on number of factors in a treatment term; default 3 |

`POOL` = string token |
Whether to pool the deviance for the terms in the accumulated summary (`yes` , `no` ); default `no` |

`NOMESSAGE` = string tokens |
Warnings to suppress from `FIT` (`dispersion` , `leverage` , `residual` , `aliasing` , `marginality` , `vertical` , `df` , `inflation` ); default `*` |

`FPROBABILITY` = string token |
Printing of probabilities for variance ratios (`yes` , `no` ); default `no` |

`TPROBABILITY` = string token |
Printing of probabilities for t-statistics (`yes` , `no` ); default `no` |

`SELECTION` = string tokens |
Statistics to be displayed in the summary of analysis produced by `PRINT=summary` (`%variance` , `%ss` , `adjustedr2` , `r2` , `dispersion` , `%meandeviance` , `%deviance` , `aic` , `bic` , `sic` ); default `disp` |

`PROBABILITY` = scalar |
Probability level for confidence intervals for parameter estimates; default 0.95 |

`SEAGGREGATION` = scalar |
Saves the standard error of the estimated aggregation parameter |

`MAXCYCLE` = variate |
Maximum number of iteration for main and Newton-Raphson estimations; default `!(15,15)` |

`TOLERANCE` = variate |
Convergence criteria for deviance and k; default `!(1E-4,1E-4)` |

### Parameter

`TERMS` = formula |
List of explanatory variates and factors, or model formula (as for `FIT` ) |
---|

### Description

The negative binomial distribution can be fitted as a generalized linear model using `FIT`

only for a given value of the aggregation parameter *k*. `RNEGBINOMIAL`

extends the fitting to include estimation of *k* from the data.

The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by

variance = mean + mean**2/*k*,

where *k* is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another model is the simpler constant overdispersion model, obtained by setting option `DISPERSION=*`

in a `MODEL`

statement with option `DISTRIBUTION=poisson`

; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)

The call to `RNEGBINOMIAL`

must be preceded by a `MODEL`

statement with option `DISTRIBUTION=negativebinomial`

(otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option `LINK=logarithm`

for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).

The `AGGREGATION`

and `SEAGGREGATION`

option allow the estimate of *k* and its standard error to be saved. The `_2LOGLIKELIHOOD`

option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by `RNEGBINOMIAL`

using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of *k* is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of *k*.) Printed output is controlled by the `PRINT`

option, which has the same settings as for the `FIT`

directive but with the addition of `aggregation`

to control the printing of the estimate of *k* and its standard error (based on observed rather than expected information; see *Method*), and `loglikelihood`

to print minus two times the log-likelihood.

The `CONSTANT`

, `FACTORIAL`

, `NOMESSAGE`

, `FPROBABILITY`

, `TPROBABILITY`

, `SELECTION`

andn`PROBABILITY`

options operate in the usual way (as for example in the `FIT`

directive). The final two options, `MAXCYCLE`

and `TOLERANCE`

, can supply variates of length 2 that can be used to control the iterative process if required. The first element of `MAXCYCLE`

sets the maximum number of times that the model is fitted as a generalized linear model for fixed *k*, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to *k* for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the `TOLERANCE`

option and successive values of the deviance are within a tolerance set by the second element.

Options: `PRINT`

, `AGGREGATION`

, `_2LOGLIKELIHOOD`

, `CONSTANT`

, `FACTORIAL`

, `POOL`

, `NOMESSAGE`

, `FPROBABILITY`

, `TPROBABILITY`

, `SELECTION`

, `PROBABILITY`

, `SEAGGREGATION`

, `MAXCYCLE`

, `TOLERANCE`

.

Parameter: `TERMS`

.

### Method

For fixed *k*, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the `FIT`

directive. For a fixed set of fitted values, *k* can be estimated by using the Newton-Raphson method to solve the score equation for *k*. Alternating between the two processes until convergence yields joint maximum likelihood estimates of *k* and the regression parameters. As the estimate of *k* is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for *k* uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.

The starting value of *k* is taken from the `AGGREGATION`

option of the `MODEL`

statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of *k* is known.

### Action with `RESTRICT`

Any restriction applied to vectors used in the regression model applies also to the results from `RNEGBINOMIAL`

.

### References

McCullagh, P. & Nelder, J.A. (1989). *Generalized Linear Models (second edition)*. Chapman & Hall, London.

Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. *Computational Statistics & Data Analysis*, 27, 151-170.

Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. *Canadian Journal of Statistics,* 15, 209-225.

### See also

Procedures: `HGANALYSE`

, `R0INFLATED`

.

Commands for: Regression analysis.

### Example

CAPTION 'RNEGBINOMIAL example',\ !t('Pump failure data from Gaver & O''Muircheartaigh (1987,',\ 'Technometrics 29, 1-15). Analysis as in Hinde & Demetrio',\ '(1998, J. Comp. Stat. & Data Anal. 27, 151-170).');\ STYLE=meta,plain FACTOR [NVALUES=10; LEVELS=2; LABELS=!t('Continuous','Standby')] mode READ [SETNVALUES=yes] mode,events,time 1 5 94.320 2 1 15.720 1 5 62.880 1 14 125.760 2 3 5.240 1 19 31.440 2 1 1.048 2 1 1.048 2 4 2.096 2 22 10.480: CALCULATE logtime=LOG(time) MODEL [DISTRIBUTION=negativebinomial; LINK=logarithm; OFFSET=logtime]\ events RNEGBINOMIAL mode