Uses the Tobit method for regression with censored data (R.W. Payne).
Options
PRINT = string tokens |
What to print (model, deviance, summary, estimates, correlations, fittedvalues, accumulated, monitoring, confidence, censored); default mode, summ, esti |
LINK = string token |
Link function (identity, logarithm, logit, reciprocal, squareroot, probit, complementaryloglog); default iden |
TERMS = formula |
Defines the model to be fitted |
CONSTANT = string token |
How to treat the constant (estimate, omit); default esti |
FACTORIAL = scalar |
Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes, no); default no |
DENOMINATOR = string token |
Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss, ms); default ss |
NOMESSAGE = string tokens |
Which warning messages to suppress (dispersion, leverage, residual, aliasing, marginality, vertical, df, inflation); default * |
FPROBABILITY = string token |
Printing of probabilities for variance and deviance ratios (yes, no); default no |
TPROBABILITY = string token |
Printing of probabilities for t-statistics (yes, no); default no |
SELECTION = string tokens |
Statistics to be displayed in the summary of analysis produced by PRINT=summary (%variance, %ss, adjustedr2, r2, dispersion, %meandeviance, %deviance, aic, bic, sic); default disp |
DISPERSION = scalar |
Dispersion parameter; default * |
PROBABILITY = scalar |
Probability level for confidence intervals for parameter estimates; default 0.95 |
WEIGHTS = variate |
Absorbing factor defining the groups for within-groups regression; default * |
MAXCYCLE = scalar |
Sets a limit on the number of iterations performed by the E-M algorithm; default 100 |
TOLERANCE = scalar |
Sets tolerance limits for convergence of the E-M algorithm on the estimates of the censored observations; default 0.001 |
DIRECTION = string tokens |
Whether the data are left or right censored (left, right); default right |
Parameters
Y = variates |
Response variate to be analysed; must be set |
BOUND = scalar, variates or pointers |
Censoring thresholds; must be set |
INITIAL = scalar or variate |
Scalar or a variate providing starting values for the censored observations in the E-M algorithm; default BOUND+1 for right-censored data and BOUND−1 for left-censored data |
NEWY = variate |
Saves a copy of the response variate with the censored observations replaced by their estimates |
OFFSET = variate |
Offset variate |
EXIT = scalar |
Exit status (0 for success, 1 for failure to converge) |
SAVE= regression save structure |
Save structure from the analysis of the data with censored observations replaced by their estimates |
Description
The RNTOBIT procedure performs a regression analysis with censored data. For example, with the default, right-censoring, some observations may be so large that it is impracticable to measure them exactly. Alternatively, with left-censoring, (specified by setting option DIRECTION= left), some observations may be below the reliable detection limit of a measuring device. You can also set DIRECTION = left, right to have censoring in both directions.
The values at which the measurements are censored must be specified by the BOUND parameter. For censoring in a single direction, this can be a scalar if all observations are censored at the same point, or a variate if they are censored at different points. If there is both left and right censoring, BOUND supplies a pointer containing, first, a scalar or variate to define the left-hand bounds, and then a scalar or variate to define the right-hand bounds.
Censored observations in the data, supplied by the Y parameter, are represented as values at or outside the boundary. The NEWY parameter can save a copy of the y-variate with the censored observations replaced by their estimates. The SAVE parameter saves a regression save structure for the analysis that can be used to display further output, or save information from the analysis, in the usual way.
The model to be fitted is specified by the TERMS option, and can contain an offset specified by the OFFSET parameter. The CONSTANT option indicates whether the constant is to be estimated or omitted, and the FACTORIAL option sets a limit on the number of variates and/or factors in the model terms, in the usual way.
The LINK option specifies the link. This can be any of the standard links, apart from power and log-ratio. The default is the identity link i.e. an ordinary regression model.
In the Tobit model, the probabilities for the uncensored observations are standard Normal probabilities. The probabilities for right-censored observations are cumulative upper Normal probabilities for values greater than or equal to the boundary value. Probabilities for left-censored observations are cumulative lower Normal probabilities for values less than or equal to the boundary value. The Tobit method uses an E-M (expectation-maximization) algorithm to estimate values for the censored observations. It starts with initial estimates for the censored observations, which can be specified by the INITIAL parameter in either a variate or a scalar. For right-censored data the default is to use the boundary value plus one. For left-censored data the default is the boundary value minus one. In each iteration, the method first fits a generalized linear model with a Normal distribution and the specified link, saving the resulting fitted values to provide estimated means for the distributions of the censored observations. The new estimates for the censored observations are then given by the expected values for the upper or lower parts of those distributions, according to whether the observations are right- or left- censored. The process continues either until the updates to the estimates are less than or equal to the value specified by the TOLERANCE option (default 0.001), or until the number of iterations equals the number specified by the MAXCYCLE option (default 100). The EXIT parameter can be set to a scalar which will be set to zero for a successful fit, one for failure in the E-M algorithm, or a missing value for an earlier fault.
The PRINT option controls the printed output. The settings are as in the FIT directive, except that the monitoring setting prints monitoring information for the E-M algorithm, and that there is an additional settings censored to print the estimates of the censored observations. The WEIGHTS and GROUPS options operate as in the MODEL directive. WEIGHTS can be used to specify duplicate observations (and the Tobit calculations are then still valid). For example, you could use a weight of two to supply a single unit in the data for two observations with an identical response and identical explanatory variates. Other options (POOL, DENOMINATOR, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELECTION, DISPERSION and PROBABILITY) operate like those of FIT.
Options: PRINT, LINK,TERMS, CONSTANT, FACTORIAL, POOL, DENOMINATOR, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELECTION, DISPERSION, PROBABILITY,MAXCYCLE,TOLERANCES, DIRECTION
Parameters: Y, BOUND, INITIAL, NEWY, OFFSET,EXIT,SAVE
Action with RESTRICT
As in FIT, the y-variate or any of the model variates or factors can be restricted to analyse a subset of the data.
See also
Directives: FIT
Procedures: ATOBIT, AUTOBIT, GLTOBITPOISSON,RGTOBIT, RNBTOBIT, RTOBITPOISSON, TOBIT,
GenStat Reference Manual 1 Summary section on: Regression analysis.
Example
CAPTION 'RGTOBIT example','Monthly useage of a production plant';\
STYLE=meta,plain
SPLOAD '%data%/Water.gsh'
" Suppose measurements below 3 were not recorded accurately
- treat these as censored "
RNTOBIT [PRINT=model,estimates,accumulated,censored; FPROBABILITY=yes;\
TERMS=Temp,Product,Opdays,Employ; DIRECTION=left] Water; BOUND=3