Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).
Options
PRINT = string tokens |
Printed output from the analysis (model , deviance , summary , estimates , correlations , fittedvalues , accumulated , monitoring , confidence , aggregation , loglikelihood ); default mode , summ , esti , aggr |
---|---|
AGGREGATION = scalar |
Saves the estimate of the aggregation parameter |
_2LOGLIKELIHOOD = scalar |
Saves the value of -2 × log-likelihood |
CONSTANT = string token |
How to treat the constant (estimate , omit ); default esti |
FACTORIAL = scalar |
Limit on number of factors in a treatment term; default 3 |
POOL = string token |
Whether to pool the deviance for the terms in the accumulated summary (yes , no ); default no |
NOMESSAGE = string tokens |
Warnings to suppress from FIT (dispersion , leverage , residual , aliasing , marginality , vertical , df , inflation ); default * |
FPROBABILITY = string token |
Printing of probabilities for variance ratios (yes , no ); default no |
TPROBABILITY = string token |
Printing of probabilities for t-statistics (yes , no ); default no |
SELECTION = string tokens |
Statistics to be displayed in the summary of analysis produced by PRINT=summary (%variance , %ss , adjustedr2 , r2 , dispersion , %meandeviance , %deviance , aic , bic , sic ); default disp |
PROBABILITY = scalar |
Probability level for confidence intervals for parameter estimates; default 0.95 |
SEAGGREGATION = scalar |
Saves the standard error of the estimated aggregation parameter |
MAXCYCLE = variate |
Maximum number of iteration for main and Newton-Raphson estimations; default !(15,15) |
TOLERANCE = variate |
Convergence criteria for deviance and k; default !(1E-4,1E-4) |
Parameter
TERMS = formula |
List of explanatory variates and factors, or model formula (as for FIT ) |
---|
Description
The negative binomial distribution can be fitted as a generalized linear model using FIT
only for a given value of the aggregation parameter k. RNEGBINOMIAL
extends the fitting to include estimation of k from the data.
The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by
variance = mean + mean**2/k,
where k is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another model is the simpler constant overdispersion model, obtained by setting option DISPERSION=*
in a MODEL
statement with option DISTRIBUTION=poisson
; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)
The call to RNEGBINOMIAL
must be preceded by a MODEL
statement with option DISTRIBUTION=negativebinomial
(otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option LINK=logarithm
for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).
The AGGREGATION
and SEAGGREGATION
option allow the estimate of k and its standard error to be saved. The _2LOGLIKELIHOOD
option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by RNEGBINOMIAL
using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.) Printed output is controlled by the PRINT
option, which has the same settings as for the FIT
directive but with the addition of aggregation
to control the printing of the estimate of k and its standard error (based on observed rather than expected information; see Method), and loglikelihood
to print minus two times the log-likelihood.
The CONSTANT
, FACTORIAL
, NOMESSAGE
, FPROBABILITY
, TPROBABILITY
, SELECTION
andnPROBABILITY
options operate in the usual way (as for example in the FIT
directive). The final two options, MAXCYCLE
and TOLERANCE
, can supply variates of length 2 that can be used to control the iterative process if required. The first element of MAXCYCLE
sets the maximum number of times that the model is fitted as a generalized linear model for fixed k, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to k for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the TOLERANCE
option and successive values of the deviance are within a tolerance set by the second element.
Options: PRINT
, AGGREGATION
, _2LOGLIKELIHOOD
, CONSTANT
, FACTORIAL
, POOL
, NOMESSAGE
, FPROBABILITY
, TPROBABILITY
, SELECTION
, PROBABILITY
, SEAGGREGATION
, MAXCYCLE
, TOLERANCE
.
Parameter: TERMS
.
Method
For fixed k, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the FIT
directive. For a fixed set of fitted values, k can be estimated by using the Newton-Raphson method to solve the score equation for k. Alternating between the two processes until convergence yields joint maximum likelihood estimates of k and the regression parameters. As the estimate of k is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for k uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.
The starting value of k is taken from the AGGREGATION
option of the MODEL
statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of k is known.
Action with RESTRICT
Any restriction applied to vectors used in the regression model applies also to the results from RNEGBINOMIAL
.
References
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. Computational Statistics & Data Analysis, 27, 151-170.
Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, 15, 209-225.
See also
Procedures: HGANALYSE
, R0INFLATED
.
Commands for: Regression analysis.
Example
CAPTION 'RNEGBINOMIAL example',\ !t('Pump failure data from Gaver & O''Muircheartaigh (1987,',\ 'Technometrics 29, 1-15). Analysis as in Hinde & Demetrio',\ '(1998, J. Comp. Stat. & Data Anal. 27, 151-170).');\ STYLE=meta,plain FACTOR [NVALUES=10; LEVELS=2; LABELS=!t('Continuous','Standby')] mode READ [SETNVALUES=yes] mode,events,time 1 5 94.320 2 1 15.720 1 5 62.880 1 14 125.760 2 3 5.240 1 19 31.440 2 1 1.048 2 1 1.048 2 4 2.096 2 22 10.480: CALCULATE logtime=LOG(time) MODEL [DISTRIBUTION=negativebinomial; LINK=logarithm; OFFSET=logtime]\ events RNEGBINOMIAL mode