RNEGBINOMIAL procedure

Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).

Options

`PRINT` = string tokens	Printed output from the analysis (`model`, `deviance`, `summary`, `estimates`, `correlations`, `fittedvalues`, `accumulated`, `monitoring`, `confidence`, `aggregation`, `loglikelihood`); default `mode`, `summ`, `esti`, `aggr`
`AGGREGATION` = scalar	Saves the estimate of the aggregation parameter
`_2LOGLIKELIHOOD` = scalar	Saves the value of -2 × log-likelihood
`CONSTANT` = string token	How to treat the constant (`estimate`, `omit`); default `esti`
`FACTORIAL` = scalar	Limit on number of factors in a treatment term; default 3
`POOL` = string token	Whether to pool the deviance for the terms in the accumulated summary (`yes`, `no`); default `no`
`NOMESSAGE` = string tokens	Warnings to suppress from `FIT` (`dispersion`, `leverage`, `residual`, `aliasing`, `marginality`, `vertical`, `df`, `inflation`); default `*`
`FPROBABILITY` = string token	Printing of probabilities for variance ratios (`yes`, `no`); default `no`
`TPROBABILITY` = string token	Printing of probabilities for t-statistics (`yes`, `no`); default `no`
`SELECTION` = string tokens	Statistics to be displayed in the summary of analysis produced by `PRINT=summary` (`%variance`, `%ss`, `adjustedr2`, `r2`, `dispersion`, `%meandeviance`, `%deviance`, `aic`, `bic`, `sic`); default `disp`
`PROBABILITY` = scalar	Probability level for confidence intervals for parameter estimates; default 0.95
`SEAGGREGATION` = scalar	Saves the standard error of the estimated aggregation parameter
`MAXCYCLE` = variate	Maximum number of iteration for main and Newton-Raphson estimations; default `!(15,15)`
`TOLERANCE` = variate	Convergence criteria for deviance and k; default `!(1E-4,1E-4)`

Parameter

`TERMS` = formula	List of explanatory variates and factors, or model formula (as for `FIT`)

Description

The negative binomial distribution can be fitted as a generalized linear model using FIT only for a given value of the aggregation parameter k. RNEGBINOMIAL extends the fitting to include estimation of k from the data.

The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by

variance = mean + mean**2/k,

where k is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another model is the simpler constant overdispersion model, obtained by setting option DISPERSION=* in a MODEL statement with option DISTRIBUTION=poisson; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)

The call to RNEGBINOMIAL must be preceded by a MODEL statement with option DISTRIBUTION=negativebinomial (otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option LINK=logarithm for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).

The AGGREGATION and SEAGGREGATION option allow the estimate of k and its standard error to be saved. The _2LOGLIKELIHOOD option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by RNEGBINOMIAL using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.) Printed output is controlled by the PRINT option, which has the same settings as for the FIT directive but with the addition of aggregation to control the printing of the estimate of k and its standard error (based on observed rather than expected information; see Method), and loglikelihood to print minus two times the log-likelihood.

The CONSTANT, FACTORIAL, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELECTION andnPROBABILITY options operate in the usual way (as for example in the FIT directive). The final two options, MAXCYCLE and TOLERANCE, can supply variates of length 2 that can be used to control the iterative process if required. The first element of MAXCYCLE sets the maximum number of times that the model is fitted as a generalized linear model for fixed k, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to k for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the TOLERANCE option and successive values of the deviance are within a tolerance set by the second element.

Options: PRINT, AGGREGATION, _2LOGLIKELIHOOD, CONSTANT, FACTORIAL, POOL, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELECTION, PROBABILITY, SEAGGREGATION, MAXCYCLE, TOLERANCE.

Parameter: TERMS.

Method

For fixed k, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the FIT directive. For a fixed set of fitted values, k can be estimated by using the Newton-Raphson method to solve the score equation for k. Alternating between the two processes until convergence yields joint maximum likelihood estimates of k and the regression parameters. As the estimate of k is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for k uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.

The starting value of k is taken from the AGGREGATION option of the MODEL statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of k is known.

Action with `RESTRICT`

Any restriction applied to vectors used in the regression model applies also to the results from RNEGBINOMIAL.

References

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.

Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. Computational Statistics & Data Analysis, 27, 151-170.

Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, 15, 209-225.

Example

CAPTION      'RNEGBINOMIAL example',\
             !t('Pump failure data from Gaver & O''Muircheartaigh (1987,',\
             'Technometrics 29, 1-15). Analysis as in Hinde & Demetrio',\
             '(1998, J. Comp. Stat. & Data Anal. 27, 151-170).');\
             STYLE=meta,plain
FACTOR       [NVALUES=10; LEVELS=2; LABELS=!t('Continuous','Standby')] mode
READ         [SETNVALUES=yes] mode,events,time
1   5  94.320   2   1  15.720   1   5  62.880   1  14 125.760   2   3   5.240
1  19  31.440   2   1   1.048   2   1   1.048   2   4   2.096   2  22  10.480:
CALCULATE    logtime=LOG(time)
MODEL        [DISTRIBUTION=negativebinomial; LINK=logarithm; OFFSET=logtime]\
             events
RNEGBINOMIAL mode

Updated on January 12, 2022

Was this article helpful?

Yes No