Uses the Tobit method to fit models to censored negative binomial data (R.W. Payne).
Options
PRINT = string tokens |
What to print (model, deviance, summary, estimates, correlations, fittedvalues, accumulated, monitoring, confidence, censored); default mode, summ, esti |
TERMS = formula |
Defines the model to be fitted |
CONSTANT = string token |
How to treat the constant (estimate, omit); default esti |
FACTORAL = scalar |
Limit for expansion of model terms; default 3 |
FIXAGGREGATION = string token |
Whether to fix the aggregation at the value specified by the AGGREGATION parameter (yes, no); default no |
POOL = string token |
Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes, no); default no |
DENOMINATOR = string tokens |
Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss, ms); default ss |
NOMESSAGE = string token |
Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality, vertical, df, inflation); default * |
FPROBABILITY = string token |
Printing of probabilities for variance and deviance ratios (yes, no); default no |
TPROBABILITY = string tokens |
Printing of probabilities for t-statistics (yes, no); default no |
SELECTION = string tokens |
Statistics to be displayed in the summary of analysis produced by PRINT=summary (%variance, %ss, adjustedr2, r2, dispersion, %meandeviance, %deviance, aic, bic, sic); default disp |
DISPERSION = scalar |
Dispersion parameter; default 1 |
PROBABILITY = scalar |
Probability level for confidence intervals for parameter estimates; default 0.95 |
WEIGHTS = variate |
Variate of weights for weighted regression; default * |
GROUPS = factor |
Absorbing factor defining the groups for within-groups regression; default * |
MAXCYCLE = scalar |
Sets a limit on the number of iterations in the full algorithm and in the Newton-Raphson algorithm used to estimate the aggregation; default !(100, 20) |
TOLERANCES = scalar |
Sets convergence limits for the estimates of the censored observations, the deviance and the estimate of the aggregation; default !(0.001, 0.0001, 0.0001) |
DIRECTION = string tokens |
Whether the data are left or right censored (left, right); default right |
Parameters
Y = variates |
Response variate to be analysed; must be set |
BOUND = scalar, variate or pointers |
Censoring thresholds; must be set |
AGGREGATION = scalar |
Specifies a fixed aggregation or saves an estimated aggregation |
_2LOGLIKELIHOOD = scalar |
Saves the value of −2 × log-likelihood |
SEAGGREGATION = scalar |
Saves the standard error of an estimated aggregation |
INITIAL = scalar or variate |
Scalar or a variate providing starting values for the censored observations in the E-M algorithm; default BOUND+1 for right-censored data and BOUND−1 for left-censored data |
NEWY = variate |
Saves a copy of the response variate with the censored observations replaced by their estimates |
OFFSET = variate |
Offset variate |
EXIT = scalar |
Exit status (0 for success, 1 for failure to converge) |
SAVE = regression save structure |
Save structure from the analysis of the data with censored observations replaced by their estimates |
Description
The negative binomial distribution can be useful for counts that are more variable than the more usual Poisson distribution. As with Poisson data, if the experiment generates a mixture of small and very large counts, it may be convenient to count only the observations less than a specified boundary value, and enter that value for the larger observations. The data then come from a right-censored negative binomial distribution. In the similar (but less common) left-censored situation, the emphasis is on the larger observations. It may then not be worth recording the small observations in detail, only that they are no larger than the boundary value. Censored negative binomial data can be analysed by the Tobit method (Terza 1985), which is implemented in this procedure.
The model to be fitted is specified by the TERMS
option, and can contain an offset specified by the OFFSET
parameter. The CONSTANT
option indicates whether the constant is to be estimated or omitted, and the FACTORAL
option sets a limit on the number of variates and/or factors in the model terms, in the usual way.
If the aggregation is known, you can set option FIXAGGREGATION=yes
, and specify the value with the AGGREGATION
parameter. Otherwise RNBTOBIT
estimates the aggregation, and you can specify an initial value and save the estimate with the AGGREGATION
parameter. You can save its standard error with SEAGGREGATION
parameter. The _2LOGLIKELIHOOD
option allows minus two times the log-likelihood to be saved. This may be useful for comparing a sequence of nested models using likelihood ratios. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.)
The DIRECTION
option specifies whether the data are left and/or right censored. The default is that they are right censored. The values at which the measurements are censored must be specified by the BOUND
parameter. For censoring in a single direction, this can be a scalar if all observations are censored at the same point, or a variate if they are censored at different points. If there is both left and right censoring, BOUND
supplies a pointer containing, first, a scalar or variate to define the left-hand bounds, and then a scalar or variate to define the right-hand bounds.
The Y
parameter specifies the response variate, with censored observations containing values on or beyond the bounds. The NEWY
parameter saves a variate where they are replaced by their estimates. The SAVE
parameter saves a regression save structure for the analysis that can be used to display further output, or save information from the analysis, in the usual way.
In the Tobit model, the probabilities for the uncensored observations are standard negative binomial probabilities. The probabilities for right-censored observations are cumulative upper negative binomial probabilities for values greater than or equal to the boundary value. Probabilities for left-censored observations cumulative lower negative binomial probabilities for values less than or equal to the boundary value. The Tobit method uses an E-M (expectation-maximization) algorithm to estimate values for the censored observations. It starts with initial estimates for the censored observations, which can be specified by the INITIAL
parameter in either a variate or a scalar. For right-censored data the default is to use the boundary value plus one. For left-censored data the default is the boundary value minus one. In each iteration, the method first fits a generalized linear model with a negative binomial distribution and a log link, saving the resulting fitted values to provide estimated means for the negative binomial distributions of the censored observations. The new estimates for the censored observations are then given by the expected values for the upper or lower parts of those negative binomial distributions, according to whether the observations are right- or left- censored. If the aggregation is to be estimated, it then estimates a new value using a Newton-Raphson method to solve the relevant equations.
The MAXCYCLE
option specifies a variate with two values. Its first value specifies the maximum number of iterations of the full algorithm (default 100), and its second value defines the maximum number of iterations to use in the Newton-Raphson algorithm that estimates the aggregation (default 20). The TOLERANCES
option specifies a variate with three values defining the criteria for convergence. Iterations continue until either all are satisfied, or the number of full iterations equals the number specified by the first value of the MAXCYCLE
option. The first TOLERANCES
value sets a limit on the changes of the estimates of the censored observations (default 0.001). The second value sets a limit on the changes in the deviance (default 0.0001). Finally, the third value sets a limit on the change in the estimate of the aggregation (default 0.0001). The EXIT
parameter can be set to a scalar which will be set to zero for a successful fit, one for failure to converge, or a missing value for an earlier fault.
The PRINT
option controls the printed output. The settings are mainly as in the FIT
directive. However, the monitoring setting prints monitoring information for the E-M algorithm, and there are three additional settings: censored prints the estimates of the censored observations, aggregation prints the aggregation with its standard error, and loglikelihood prints minus two times the log-likelihood. The WEIGHTS
and GROUPS
options operate as in the MODEL
directive. WEIGHTS
can be used to specify duplicate observations (and the Tobit calculations are then still valid). For example, you could use a weight of two to supply a single unit in the data for two observations with an identical response and identical explanatory variates. Other options (POOL
, DENOMINATOR
, NOMESSAGE
, FPROBABILITY
, TPROBABILITY
, SELECTION
, DISPERSION
and PROBABILITY
) operate like those of FIT
.
Options: PRINT
, TERMS
,CONSTANT
, FACTORIAL
, POOL
, DENOMINATOR
, NOMESSAGE
, FPROBABILITY
, TPROBABILITY
, SELECTION
, DISPERSION
, PROBABILITY
, WEIGHTS
,GROUPS
,MAXCYCLE
, TOLERANCES
, DIRECTION
Parameters: Y
, BOUND
, AGGREGATION
, SEAGGREGATION
, INITIAL
,NEWY
,OFFSET
, EXIT
, SAVE
Method
The expected values for the upper parts of the negative binomial distributions are calculated by the EUNEGBINOMIAL procedure, and those for the lower parts of the distributions are calculated by the ELNEGBINOMIAL procedure. The amalgamation is estimated by the same method as in the RNEGBINOMIAL procedure (which is used by RNBTOBIT to obtain the initial amalgamation estimate).
Action with RESTRICT
As in FIT, the y-variate or any of the model variates or factors can be restricted to analyse a subset of the data.
Reference
Terza, J.V. (1985). A Tobit-type estimator for the censored Poisson regression model. Economics Letters, 18, 361-365.
See also
Directives: FIT
Procedures: ATOBIT
, AUTOBIT
, ELNEGBINOMIAL
,EUNEGBINOMIAL
, GLTOBITPOISSON
, HGTOBITPOISSON
,RGTOBIT
, RNEGBINOMIAL
, RNBTOBIT
, RTOBITPOISSON
, TOBIT
GenStat Reference Manual 1 Summary section on: Regression analysis.
Example
CAPTION 'RNBTOBIT example',\ !t('Fabric data (Lee, Nelder & Pawitan 2006, pages 197-198)',\ 'y is number of faults in rolls of fabric of length x.');\ STYLE=meta,plain SPLOAD [PRINT=*] '%Examples%/LeeNelderPawitan/Fabric.gsh' CALCULATE logx = LOG(x) RNBTOBIT [PRINT=model,summary,estimates,censored; TERMS=logx] y; BOUND=20