1. Home
  2. TOBIT procedure

TOBIT procedure

Performs a Tobit linear mixed model analysis on data with fixed-threshold censoring (M.C. Hannah & V.M. Cave).

Options

PRINT = string token Controls printed output (summary); default summ
VPRINT = string tokens Controls printed output from the REML analysis of the data with censored observations replaced by their estimates (model, components, effects, means, stratumvariances, monitoring, vcovariance, deviance, Waldtests, missingvalues); default mode, comp, Wald
PSE = string token Standard errors to be printed with tables of effects and means from the REML analysis (differences, estimates, alldifferences, allestimates, none); default diff
PLOT = string token To display a scatter plot of the data with censored observations replaced by their estimates against the observed data(scatterplot); default *
MAXCYCLE = scalar Sets a limit on the number of iterations performed by the E-M algorithm; default 30
TOLERANCE = variate Sets tolerance limits for convergence of the E-M algorithm on the treatment means and the variance components; default 0.1 and 0.05 for the treatment means and variance components, respectively
RMETHOD = string token Which random terms to use when calculating the residuals during the E-step of the E-M algorithm (final, all); default final
DIRECTION = string token The direction of the censoring (left, right); default left (i.e., the true values for the censored observations are less than or equal to the BOUND)

Parameters

Y = variate Response variate to be analysed; no default, must be set
BOUND = scalar Censoring threshold; no default, must be set
CENSORED = variate Indicator variable for censored observations, with values of one where the response values are censored and zero otherwise
INITIAL = scalar or variate Scalar or a variate providing starting values for the censored observations in the E-M algorithm
NEWY = variate Saves a copy of the response variate with the censored observations replaced by their estimates
YCENSORED = variate Saves a logical variate indicating which Y values are censored
SAVE = REML save structure REML save structure from the analysis of the data with censored observations replaced by their estimates

Description

The TOBIT procedure performs a linear mixed model analysis on data values that are subject to fixed threshold censoring. Such censoring occurs when a measurement cannot be taken above or below a bound. For example, chemical concentrations may be censored when they fall below a minimum level of quantification. The procedure uses an E-M algorithm to estimate values for the censored observations, and once converged, uses REML to analyse the response variate with the censored observations replaced by their estimates.

TOBIT must be preceded by a VCOMPONENTS command to define the fixed and random models. (Note, however, that TOBIT does not accommodate spline terms in VCOMPONENTS, nor linear mixed models with complex covariance structures defined by VSTRUCTURE.)

The response variate must be supplied using the Y parameter, and a scalar defining the fixed censoring threshold must be supplied using the BOUND parameter. By default, the data values are assumed to be left-censored (i.e., measurements less than or equal to the value specified by the BOUND parameter are censored). However, right-censoring (i.e., when measurements greater than or equal to the BOUND are censored) can be specified by setting the DIRECTION option to right. Censored observations in Y may be represented either by missing values or by values at or outside the BOUND (i.e., for left-censoring, y-values ≤ BOUND, or, for right-censoring, y-values ≥ BOUND). If missing values are used, an indicator variate, with values of one corresponding to censored observations and values of zero to the non-censored observations, must be supplied using the CENSORED parameter.

The MAXCYCLE, TOLERANCE and RMETHOD options, the INITIAL parameter and the VAOPTIONS procedure can be used to control various aspects of the E-M algorithm performed by TOBIT. The INITIAL parameter provides starting values for the estimates of the censored observations. If available, these may speed up convergence of the E-M algorithm. The values should be below the value specified by the BOUND parameter when DIRECTION=left, or above that value when DIRECTION=right. INITIAL can supply a scalar if a common starting value is to be used for all the censored observations. Alternatively, if different values are required, INITIAL should supply a variate of the same length as Y. Only the values corresponding to censored observations are used, the others are ignored. If INITIAL if not specified, the default is to use the value specified by the BOUND parameter.

The MAXCYCLE option specifies the maximum number of iterations performed by the E-M algorithm (default 30). By default, the E-M algorithm is deemed to have converged if the percentage change in each estimated treatment mean is less than 0.1%, and the percentage change in each estimated variance component is less than 0.05%. However, you can change these tolerance limits by setting the TOLERANCE option to a variate of length two. Its first value specifies the maximum acceptable percentage change for the treatment means, and its second value specifies the maximum acceptable percentage change for the variance components.

The RMETHOD specifies which random terms are used when estimating values for the censored observations during the E-step of the E-M algorithm. With RMETHOD=all, the censored observations are estimated from the fixed effects only, whereas when RMETHOD=final, the censored observations are estimated from the fixed and random effects; default final. Finally, the VAOPTIONS procedure can be used to specify the MAXCYCLE and WORKSPACE options of the REML commands used during the M-step of the E-M algorithm.

Printed output is controlled by the PRINT, VPRINT, and PSE options. The PRINT option has one setting, summary, which prints information on the number of E-M algorithm iterations performed, the percentage of observations censored and the censoring threshold. This is the default, but you can suppress this output by setting option PRINT=*. The VPRINT and PSE options control the printed output from the REML analysis when the censored observations have been replaced by their estimates. The VPRINT option has the same settings as the PRINT option of the REML directive, other than that covariancemodels is excluded; the default is PRINT=model,comp,Wald. Similarly, the setting of PSE are the same as those of the PSE option of the REML directive; the default is PSE=diff.

You can set option PLOT=scatterplot to display a scatter plot of the data, plotting the new y-variate, with censored observations replaced by their estimates, against the observed response variate. When censored observations in Y are entered as missing values, they are plotted at the value specified by the BOUND parameter; otherwise, they are plotted at the values given in Y. Superimposed onto this plot are a 1-1 line and a horizontal reference line at the censoring threshold defined by the BOUND parameter. By default, no plot is produced.

The NEWY parameter allows you to save a copy of the response variate with the censored observations replaced by their estimates. An indicator variable with values of one corresponding to censored observations in Y and values of zero to non-censored observations can be saved using the YCENSORED parameter. Note, this will be equivalent to any variate supplied by CENSORED. The SAVE parameter can be used to save the save structure from the REML analysis of the data with censored observations replaced by their estimates, for later use by other REML directives and procedures, such as VDISPLAY and VGRAPH.

Options: PRINT, VPRINT, PSE, PLOT, MAXCYCLE, TOLERANCE, RMETHOD, DIRECTION
Parameters: Y, BOUND, CENSORED, INITIAL, NEWY, YCENSORED, SAVE

Method

The E-M (expectation-maximization) algorithm is an iterative two step method to optimize a model. The initial expectation step uses the initial values (either INITIAL, if given, or BOUND) for the censored observations. In the maximization step, the current estimates of the censored values are used in the y-variate in a standard REML analysis to estimate the fitted values and their variances. In subsequent expectation steps, the censored values are estimated as the expected value of the tail of Normal distribution with means and variances for these observations from previous M-step model. The expected deviate in the lower tail of a Normal distribution (x < BOUND) is
m - SQRT(v)*PRNORMAL(BOUND;m;v)/CLNORMAL(BOUND;m;v).

Action with RESTRICT

Restrictions are not allowed.

References

Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics, 24, 3-61.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38.
Taylor, J. (1973). The analysis of designed experiments with censored observations. Biometrics, 29, 35-43.
Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26, 24-36.

See also

Directives: REML, VCOMPONENTS, VDISPLAY, VKEEP.
Procedures: CENSOR, GLTOBITPOISSON, HGTOBITPOISSON, RTOBITPOISSON, VAOPTIONS.
Commands for: REML analysis of linear mixed models.

Example

CAPTION 'TOBIT example',\
        !T('Oats.gsh contains yield data from a split-plot experiment.',\
           'For details, see section 5.1 in',\
           'A Guide to Anova and Design in Genstat');\
        STYLE=meta,plain

SPLOAD [PRINT=summary] '%Data%/Oats.gsh'

CAPTION 'Example 1: Yield left-censored data at 70.'; STYLE=meta
CALCULATE yield_lc = yield*(yield .GT. 70) + 70*(yield .LE. 70)
VCOMPONENTS [FIXED=nitrogen*variety] RANDOM=blocks/wplots/subplots
TOBIT [PLOT=scatterplot] Y=yield_lc; BOUND=70

CAPTION 'Example 2: Yield right-censored data at 130.'; STYLE=meta
CALCULATE yield_rc = yield*(yield .LT. 130) + 130*(yield .GE. 130)
VCOMPONENTS [FIXED=nitrogen*variety] RANDOM=blocks/wplots/subplots
TOBIT [DIRECTION=right; PLOT=scatterplot] Y=yield_rc; BOUND=130

CAPTION !T('Example 3: Yield right-censored data at 130,', \
           'recorded as missing values.'); STYLE=meta
CALCULATE yield_rc0 = REPLACE(yield_rc; 130; !s(*))
CALCULATE censored = yield_rc0 == !s(*)
TOBIT [DIRECTION=right; PLOT=scatterplot] Y=yield_rc0; BOUND=130; \
      CENSORED=censored

CAPTION 'Example 4: Saving structures to produce extra output.'; STYLE=meta
TOBIT [VPRINT=*] Y=yield_lc; BOUND=70; NEWY=newy; YCENSORED=ycen; SAVE=vsave
VPLOT [SAVE=vsave; RMETHOD=final] METHOD=fitted; PEN=ycen+1
VPLOT [SAVE=vsave; RMETHOD=all] METHOD=fitted; PEN=ycen+1
FACTOR [LEVELS=!(0,1); LABELS=!T(uncensored,censored)] ycenF; VALUES=ycen
DOTHISTOGRAM [KEYDESCRIPTION=''] newy; PENS=ycenF
VKEEP [FITTED=fit; SAVE=vsave] 
DOTHISTOGRAM [KEYDESCRIPTION=''] fit; PENS=ycenF
Updated on February 7, 2023

Was this article helpful?