1. Home
2. RNEGBINOMIAL procedure

# RNEGBINOMIAL procedure

Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).

### Options

`PRINT` = string tokens Printed output from the analysis (`model`, `deviance`, `summary`, `estimates`, `correlations`, `fittedvalues`, `accumulated`, `monitoring`, `aggregation`, `loglikelihood`); default `mode`, `summ`, `esti`, `aggr` Saves the estimate of the aggregation parameter Saves the value of -2 × log-likelihood How to treat the constant (`estimate`, `omit`); default `esti` Limit on number of factors in a treatment term; default 3 Warnings to suppress from `FIT` (`dispersion`, `leverage`, `residual`, `aliasing`, `marginality`, `vertical`, `df`, `inflation`); default `*` Printing of probabilities for variance ratios (`yes`, `no`); default `no` Printing of probabilities for t-statistics (`yes`, `no`); default `no` Statistics to be displayed in the summary of analysis produced by `PRINT=summary` (`%variance`, `%ss`, `adjustedr2`, `r2`, `dispersion`, `%meandeviance`, `%deviance`, `aic`, `bic`, `sic`); default `disp` Saves the standard error of the estimated aggregation parameter Maximum number of iteration for main and Newton-Raphson estimations; default `!(15,15)` Convergence criteria for deviance and k; default `!(1E-4,1E-4)`

### Parameter

`TERMS` = formula List of explanatory variates and factors, or model formula (as for `FIT`)

### Description

The negative binomial distribution can be fitted as a generalized linear model using `FIT` only for a given value of the aggregation parameter k. `RNEGBINOMIAL` extends the fitting to include estimation of k from the data.

The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by

variance = mean + mean**2/k,

where k is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another model is the simpler constant overdispersion model, obtained by setting option `DISPERSION=*` in a `MODEL` statement with option `DISTRIBUTION=poisson`; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)

The call to `RNEGBINOMIAL` must be preceded by a `MODEL` statement with option `DISTRIBUTION=negativebinomial` (otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option `LINK=logarithm` for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).

The `AGGREGATION` and `SEAGGREGATION` option allow the estimate of k and its standard error to be saved. The `_2LOGLIKELIHOOD` option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by `RNEGBINOMIAL` using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.) Printed output is controlled by the `PRINT` option, which has the same settings as for the `FIT` directive but with the addition of `aggregation` to control the printing of the estimate of k and its standard error (based on observed rather than expected information; see Method), and `loglikelihood` to print minus two times the log-likelihood.

The `CONSTANT`, `FACTORIAL`, `NOMESSAGE`, `FPROBABILITY`, `TPROBABILITY`, and `SELECTION` options operate in the usual way (as for example in the `FIT` directive). The final two options, `MAXCYCLE` and `TOLERANCE`, can supply variates of length 2 that can be used to control the iterative process if required. The first element of `MAXCYCLE` sets the maximum number of times that the model is fitted as a generalized linear model for fixed k, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to k for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the `TOLERANCE` option and successive values of the deviance are within a tolerance set by the second element.

Options: `PRINT`, `AGGREGATION`, `_2LOGLIKELIHOOD`, `CONSTANT`, `FACTORIAL`, `NOMESSAGE`, `FPROBABILITY`, `TPROBABILITY`, `SELECTION`, `SEAGGREGATION` , `MAXCYCLE`, `TOLERANCE`.

Parameter: `TERMS`.

### Method

For fixed k, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the `FIT` directive. For a fixed set of fitted values, k can be estimated by using the Newton-Raphson method to solve the score equation for k. Alternating between the two processes until convergence yields joint maximum likelihood estimates of k and the regression parameters. As the estimate of k is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for k uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.

The starting value of k is taken from the `AGGREGATION` option of the `MODEL` statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of k is known.

### Action with `RESTRICT`

Any restriction applied to vectors used in the regression model applies also to the results from `RNEGBINOMIAL`.

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.

Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. Computational Statistics & Data Analysis, 27, 151-170.

Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, 15, 209-225.

Procedures: `HGANALYSE`, `R0INFLATED`.

Commands for: Regression analysis.

### Example

```CAPTION      'RNEGBINOMIAL example',\
!t('Pump failure data from Gaver & O''Muircheartaigh (1987,',\
'Technometrics 29, 1-15). Analysis as in Hinde & Demetrio',\
'(1998, J. Comp. Stat. & Data Anal. 27, 151-170).');\
STYLE=meta,plain
FACTOR       [NVALUES=10; LEVELS=2; LABELS=!t('Continuous','Standby')] mode