1. Home
2. RCHECK procedure

# RCHECK procedure

Checks the fit of a linear, generalized linear or nonlinear regression (P.W. Lane, R. Cunningham & C. Donnelly).

### Options

`PRINT` = string tokens What to print (`index`, `y`, `residuals`, `leverages`, `Cook`); default `*` Type of residual to use (`deviance`, `Pearson`, `simple`, `deletion`); default `*` i.e. as set in `MODEL` Which variable to use as index; default `!(1...n)` Type of envelope with Normal and half-Normal plots (`none`, `rough`, `smooth`, `asymptotic`); default `none` Approximate probability level for envelope; default 0.95 How many simulations to generate for rough or smooth envelopes; default (1+`PROB`)/(1-`PROB`) Whether to show shaded envelope rather than boundaries (`no`, `yes`); default `no` To store chosen type of residuals; default `*` To store leverages; default `*` To store modified Cook’s statistics; default `*` Type of graphics to use (`lineprinter`, `highresolution`); default `high` Title for graph; default identifier of response Window or series of windows in which to display graphs; default 4, or 5…8 for composite Treatment of previous graphics screen (`clear`, `keep`); default `clea` Specifies which model to check; default `*`

### Parameters

`YSTATISTIC` = string tokens What to display in the graph (`residuals`, `Cook`, `leverages`, `absresiduals`); default `resi` What type of graph (`fittedvalues`, `index`, `normal`, `halfnormal`, `histogram`, `composite`); default `comp`

### Description

Procedure `RCHECK` provides “diagnostic” information for checking the fit of regression models. Those directives make some checks, such as for large residuals and influential points, and give access to simple and standardized residuals and leverages through directive `RKEEP`. The `RCHECK` procedure automatically accesses these quantities via `RKEEP` and in addition can calculate deletion residuals and modified Cook’s statistics. A range of graphs can then be drawn to help check the fit of the regression model. The defaults are intended to provide a sensible display from the simple command

`RCHECK`

following the fit of a regression model.

The procedure is controlled by the `YSTATISTIC` and `XMETHOD` parameters. These can be set to display various types of residuals, as specified by the `RMETHOD` option; the default is the setting of this option in the `MODEL` command in force when the model was fitted. In addition, the absolute residuals, the leverages, or the modified Cook’s statistics can be displayed. Each of these sets of statistics can be plotted against the fitted values or against an index variable; by default, the index just orders the values in the order of the units. The statistics can also be shown as Normal or half-Normal plots, or as a histogram (the Normal plot for absolute residuals being the same as the half-Normal plot). A set of four such plots is displayed as a composite picture: histogram, plot against fitted values, Normal plot and half-Normal plot (with an index plot replacing the Normal plot for absolute residuals). Graphs can be displayed in line-printer style by setting the `GRAPHICS` option, though some features are not then available.

The chosen type of residuals, the leverages and Cook’s statistics can be printed, or stored in variates using the `RESIDUALS` option.

Plots of the residuals against fitted values or an index variable are displayed with a smoothed line fitted through the points, to indicate any potential trend.

Normal and half-Normal plots can be enhanced with an “envelope” by setting the `ENVELOPE` option. The `rough` setting produces an upper and lower bound for the values, and a median line, produced by simulation. The bounds correspond approximately to individual confidence intervals for each value, with probability as set by the `PROBABILITY` option (default 95%). The number of simulations by default is the minimum to allow estimation of the required limits: this is (1+`PROBABILITY`) / (1-`PROBABILITY`). A larger number of simulations can be requested with the `NSIMULATIONS` option, to give better estimates at the expense of more computing time. The `smooth` setting requests that the bounds are smoothed, using a cubic smooting spline with 4 d.f. The `asymptotic` setting produces bounds calculated from the asymptotic distribution of Normal order statistics. The envelope for all these settings can be displayed as a shaded region rather than as a set of three lines by setting the `SHADE` option to `yes`.

Envelopes cannot be calculated for nonlinear models or curves, nor for generalized linear models with inverse Normal, negative binomial, geometric, multinomial or calculated distributions. Nor can they be produced for deletion residuals or Cook’s statistics; they are not appropriate for leverages, which have no associated distributional assumption.

The graphical displays can be controlled as usual using the `TITLE` and `SCREEN` options. The `WINDOW` option can be used to select a defined windows for high-resolution plots. Otherwise window 4 is used for a single plot or windows 5-8 for composite plots. These are redefined if necessary to fill the frame.

The colours and symbols used in the displays can be controlled by setting the attributes of the following pens with the `PEN` directive before calling the procedure:

    pen 2 zero lines in fitted-value, Normal and index plots; points and histogram bars; shading of envelopes; smooth line in fitted-value and index plots of residuals, and envelope bounds if unshaded.

The procedure exits if there are fewer than four observations, or fewer than two non-missing standardized residuals.

Options: `PRINT`, `RMETHOD`, `INDEX`, `ENVELOPE`, `NSIMULATIONS`, `PROBABILITY`, `SHADE`, `RESIDUALS`, `LEVERAGES`, `COOK`, `GRAPHICS`, `TITLE`, `WINDOW`, `SCREEN`, `SAVE`.

Parameters: `YSTATISTIC`, `XMETHOD`.

### Method

Standardized residuals and leverages are accessed using `RKEEP` from the latest fitted regression model, or from that specified by the `SAVE` option. Deletion residuals di are calculated for linear models as follows:

di = ri /√((npri2)/(np-1))

where ri are the standardized residuals, n is the number of observations, and p is the number of parameters in the model. For generalized linear models,

di = SIGN(rdi) × √((1-li) × rdi2 + li) × rpi2)

where rdi and rpi are the standardized deviance and Pearson residuals respectively.

Modified Cook’s statistics ci are calculated as follows:

ci = ABS(di) × √{ (np) × li / (p × (1-li)) }

where li are the leverages. In Normal plots, the Normal quantiles are calculated as follows:

qi = NED( (i-0.375) / (n+0.25) )

while for a half-Normal plot they are given by

qi = NED( 0.5 + 0.5 × (i-0.375) / (n+0.25) )

For generalized linear models, fitted values are transformed by an approximate variance-stabilizing transformation before use in graphs:

Poisson, multinomial, negative binomial and geometric 2 × SQRT(fitted)

    binomial, Bernoulli 2 × ANG(100 × fitted / nbinomial) LOG(fitted) 1 / fitted

The smoothed line displayed for fitted-value or index plots is calculated as a straight line if the number n of distinct explanatory values is >3. Otherwise it is a cubic smoothing spline, with 2 d.f. for n>9, 3 for n>34 or 4 for n>59.

For Normal linear models, envelopes are calculated by default from ns sets of Normal random numbers, where

ns = (1 + `PROBABILITY`) / (1 – `PROBABILITY`).

If the number of observations is less than 100, the values are transformed using the projection matrix to induce the observed correlation pattern of the data; for larger datasets, no transformation is done. The values are then ordered and the minimum and maximum values determine the envelope boundaries. If ns is set by the `NSIMULATIONS` option, the boundaries are calculated with the `QUANTILES` function from the ns values generated for each ordered residual. For generalized linear models, ns sets of values of the response variate are generated from the distribution, with parameters estimated from the current fit. The model is refitted to each set, and the residuals extracted and dealt with as for the transformed Normal values above.

### Action with `RESTRICT`

Restrictions applied to vectors used in the regression apply also to the `RCHECK` procedure. Values of diagnostic quantities are set to missing for all excluded units.

Procedures: `RDESTIMATES`, `RGRAPH`, `APLOT`, `DRESIDUALS`, `VPLOT`.

Commands for: Regression analysis.

### Example

```CAPTION 'RCHECK example',\
!t('Model atmospheric pressure on boiling point',\
'(data from Atkinson, 1985, Plots, Transformations & Regression).');\
STYLE=meta,plain
VARIATE [NVALUES=17] Boil,Pressure
194.5 20.79  194.3 20.79  197.9 22.40  198.4 22.67  199.4 23.15
199.9 23.35  200.9 23.89  201.1 23.99  201.4 24.02  201.3 24.01
203.6 25.14  204.6 26.57  209.5 28.49  208.6 27.76  210.7 29.04
211.9 29.88  212.2 30.06 :
CALCULATE LogPressure = 100*LOG10(Pressure)
MODEL   LogPressure
FIT     Boil
CAPTION '1. Plot composite of four displays of the standardized residuals.'
RCHECK
CAPTION !t('2. Plot simple residuals against boiling point,',\
'and display a Normal plot of simple residuals.')
RCHECK  [RMETHOD=simple; INDEX=Boil] Y=2(residual); XMETHOD=index,Normal
CAPTION !t('3. Display a half-Normal plot with a generated envelope,',\
'that has been smoothed, and display as a shaded area;',\
'change colours to give dark blue points on cyan background.')
PEN     3,4; COLOUR='blue','aqua'