Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).
|Printed output from the analysis (
||Saves the estimate of the aggregation parameter|
||Saves the value of -2 × log-likelihood|
||How to treat the constant (
||Limit on number of factors in a treatment term; default 3|
||Whether to pool the deviance for the terms in the accumulated summary (
||Warnings to suppress from
||Printing of probabilities for variance ratios (
||Printing of probabilities for t-statistics (
||Statistics to be displayed in the summary of analysis produced by
||Probability level for confidence intervals for parameter estimates; default 0.95|
||Saves the standard error of the estimated aggregation parameter|
||Maximum number of iteration for main and Newton-Raphson estimations; default
||Convergence criteria for deviance and k; default
||List of explanatory variates and factors, or model formula (as for
The negative binomial distribution can be fitted as a generalized linear model using
FIT only for a given value of the aggregation parameter k.
RNEGBINOMIAL extends the fitting to include estimation of k from the data.
The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by
variance = mean + mean**2/k,
where k is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another model is the simpler constant overdispersion model, obtained by setting option
DISPERSION=* in a
MODEL statement with option
DISTRIBUTION=poisson; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)
The call to
RNEGBINOMIAL must be preceded by a
MODEL statement with option
DISTRIBUTION=negativebinomial (otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option
LINK=logarithm for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).
SEAGGREGATION option allow the estimate of k and its standard error to be saved. The
_2LOGLIKELIHOOD option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by
RNEGBINOMIAL using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.) Printed output is controlled by the
FIT directive but with the addition of
aggregation to control the printing of the estimate of k and its standard error (based on observed rather than expected information; see Method), and
loglikelihood to print minus two times the log-likelihood.
PROBABILITY options operate in the usual way (as for example in the
FIT directive). The final two options,
TOLERANCE, can supply variates of length 2 that can be used to control the iterative process if required. The first element of
MAXCYCLE sets the maximum number of times that the model is fitted as a generalized linear model for fixed k, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to k for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the
TOLERANCE option and successive values of the deviance are within a tolerance set by the second element.
For fixed k, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the
FIT directive. For a fixed set of fitted values, k can be estimated by using the Newton-Raphson method to solve the score equation for k. Alternating between the two processes until convergence yields joint maximum likelihood estimates of k and the regression parameters. As the estimate of k is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for k uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.
The starting value of k is taken from the
AGGREGATION option of the
MODEL statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of k is known.
Any restriction applied to vectors used in the regression model applies also to the results from
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. Computational Statistics & Data Analysis, 27, 151-170.
Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, 15, 209-225.
Commands for: Regression analysis.
CAPTION 'RNEGBINOMIAL example',\ !t('Pump failure data from Gaver & O''Muircheartaigh (1987,',\ 'Technometrics 29, 1-15). Analysis as in Hinde & Demetrio',\ '(1998, J. Comp. Stat. & Data Anal. 27, 151-170).');\ STYLE=meta,plain FACTOR [NVALUES=10; LEVELS=2; LABELS=!t('Continuous','Standby')] mode READ [SETNVALUES=yes] mode,events,time 1 5 94.320 2 1 15.720 1 5 62.880 1 14 125.760 2 3 5.240 1 19 31.440 2 1 1.048 2 1 1.048 2 4 2.096 2 22 10.480: CALCULATE logtime=LOG(time) MODEL [DISTRIBUTION=negativebinomial; LINK=logarithm; OFFSET=logtime]\ events RNEGBINOMIAL mode