1. Home
  2. RCHECK procedure

RCHECK procedure

Checks the fit of a linear, generalized linear or nonlinear regression (P.W. Lane, R. Cunningham & C. Donnelly).

Options

PRINT = string tokens What to print (index, y, residuals, leverages, Cook); default *
RMETHOD = string token Type of residual to use (deviance, Pearson, simple, deletion); default * i.e. as set in MODEL
INDEX = variate or factor Which variable to use as index; default !(1...n)
ENVELOPE = string token Type of envelope with Normal and half-Normal plots (none, rough, smooth, asymptotic); default none
PROBABILITY = scalar Approximate probability level for envelope; default 0.95
NSIMULATIONS = scalar How many simulations to generate for rough or smooth envelopes; default (1+PROB)/(1-PROB)
SHADE = string token Whether to show shaded envelope rather than boundaries (no, yes); default no
RESIDUALS = variate To store chosen type of residuals; default *
LEVERAGES = variate To store leverages; default *
COOK = variate To store modified Cook’s statistics; default *
GRAPHICS = string token Type of graphics to use (lineprinter, highresolution); default high
TITLE = text Title for graph; default identifier of response
WINDOW = numbers Window or series of windows in which to display graphs; default 4, or 5…8 for composite
SCREEN = string token Treatment of previous graphics screen (clear, keep); default clea
SAVE = regression save structure Specifies which model to check; default *

Parameters

YSTATISTIC = string tokens What to display in the graph (residuals, Cook, leverages, absresiduals); default resi
XMETHOD = string tokens What type of graph (fittedvalues, index, normal, halfnormal, histogram, composite); default comp

Description

Procedure RCHECK provides “diagnostic” information for checking the fit of regression models. Those directives make some checks, such as for large residuals and influential points, and give access to simple and standardized residuals and leverages through directive RKEEP. The RCHECK procedure automatically accesses these quantities via RKEEP and in addition can calculate deletion residuals and modified Cook’s statistics. A range of graphs can then be drawn to help check the fit of the regression model. The defaults are intended to provide a sensible display from the simple command

RCHECK

following the fit of a regression model.

The procedure is controlled by the YSTATISTIC and XMETHOD parameters. These can be set to display various types of residuals, as specified by the RMETHOD option; the default is the setting of this option in the MODEL command in force when the model was fitted. In addition, the absolute residuals, the leverages, or the modified Cook’s statistics can be displayed. Each of these sets of statistics can be plotted against the fitted values or against an index variable; by default, the index just orders the values in the order of the units. The statistics can also be shown as Normal or half-Normal plots, or as a histogram (the Normal plot for absolute residuals being the same as the half-Normal plot). A set of four such plots is displayed as a composite picture: histogram, plot against fitted values, Normal plot and half-Normal plot (with an index plot replacing the Normal plot for absolute residuals). Graphs can be displayed in line-printer style by setting the GRAPHICS option, though some features are not then available.

The chosen type of residuals, the leverages and Cook’s statistics can be printed, or stored in variates using the RESIDUALS option.

Plots of the residuals against fitted values or an index variable are displayed with a smoothed line fitted through the points, to indicate any potential trend.

Normal and half-Normal plots can be enhanced with an “envelope” by setting the ENVELOPE option. The rough setting produces an upper and lower bound for the values, and a median line, produced by simulation. The bounds correspond approximately to individual confidence intervals for each value, with probability as set by the PROBABILITY option (default 95%). The number of simulations by default is the minimum to allow estimation of the required limits: this is (1+PROBABILITY) / (1-PROBABILITY). A larger number of simulations can be requested with the NSIMULATIONS option, to give better estimates at the expense of more computing time. The smooth setting requests that the bounds are smoothed, using a cubic smooting spline with 4 d.f. The asymptotic setting produces bounds calculated from the asymptotic distribution of Normal order statistics. The envelope for all these settings can be displayed as a shaded region rather than as a set of three lines by setting the SHADE option to yes.

Envelopes cannot be calculated for nonlinear models or curves, nor for generalized linear models with inverse Normal, negative binomial, geometric, multinomial or calculated distributions. Nor can they be produced for deletion residuals or Cook’s statistics; they are not appropriate for leverages, which have no associated distributional assumption.

The graphical displays can be controlled as usual using the TITLE and SCREEN options. The WINDOW option can be used to select a defined windows for high-resolution plots. Otherwise window 4 is used for a single plot or windows 5-8 for composite plots. These are redefined if necessary to fill the frame.

The colours and symbols used in the displays can be controlled by setting the attributes of the following pens with the PEN directive before calling the procedure:

    pen 2 zero lines in fitted-value, Normal and index plots;
    pen 3 points and histogram bars;
    pen 4 shading of envelopes;
    pen 5 smooth line in fitted-value and index plots of residuals, and envelope bounds if unshaded.

The procedure exits if there are fewer than four observations, or fewer than two non-missing standardized residuals.

Options: PRINT, RMETHOD, INDEX, ENVELOPE, NSIMULATIONS, PROBABILITY, SHADE, RESIDUALS, LEVERAGES, COOK, GRAPHICS, TITLE, WINDOW, SCREEN, SAVE.

Parameters: YSTATISTIC, XMETHOD.

Method

Standardized residuals and leverages are accessed using RKEEP from the latest fitted regression model, or from that specified by the SAVE option. Deletion residuals di are calculated for linear models as follows:

di = ri /√((npri2)/(np-1))

where ri are the standardized residuals, n is the number of observations, and p is the number of parameters in the model. For generalized linear models,

di = SIGN(rdi) × √((1-li) × rdi2 + li) × rpi2)

where rdi and rpi are the standardized deviance and Pearson residuals respectively.

Modified Cook’s statistics ci are calculated as follows:

ci = ABS(di) × √{ (np) × li / (p × (1-li)) }

where li are the leverages. In Normal plots, the Normal quantiles are calculated as follows:

qi = NED( (i-0.375) / (n+0.25) )

while for a half-Normal plot they are given by

qi = NED( 0.5 + 0.5 × (i-0.375) / (n+0.25) )

For generalized linear models, fitted values are transformed by an approximate variance-stabilizing transformation before use in graphs:

Poisson, multinomial, negative binomial and geometric 2 × SQRT(fitted)

    binomial, Bernoulli 2 × ANG(100 × fitted / nbinomial)
    gamma, exponential LOG(fitted)
    inverse Normal 1 / fitted

The smoothed line displayed for fitted-value or index plots is calculated as a straight line if the number n of distinct explanatory values is >3. Otherwise it is a cubic smoothing spline, with 2 d.f. for n>9, 3 for n>34 or 4 for n>59.

For Normal linear models, envelopes are calculated by default from ns sets of Normal random numbers, where

ns = (1 + PROBABILITY) / (1 – PROBABILITY).

If the number of observations is less than 100, the values are transformed using the projection matrix to induce the observed correlation pattern of the data; for larger datasets, no transformation is done. The values are then ordered and the minimum and maximum values determine the envelope boundaries. If ns is set by the NSIMULATIONS option, the boundaries are calculated with the QUANTILES function from the ns values generated for each ordered residual. For generalized linear models, ns sets of values of the response variate are generated from the distribution, with parameters estimated from the current fit. The model is refitted to each set, and the residuals extracted and dealt with as for the transformed Normal values above.

Action with RESTRICT

Restrictions applied to vectors used in the regression apply also to the RCHECK procedure. Values of diagnostic quantities are set to missing for all excluded units.

See also

Procedures: RDESTIMATES, RGRAPH, APLOT, DRESIDUALS, VPLOT.

Commands for: Regression analysis.

Example

CAPTION 'RCHECK example',\
        !t('Model atmospheric pressure on boiling point',\ 
        '(data from Atkinson, 1985, Plots, Transformations & Regression).');\
        STYLE=meta,plain
VARIATE [NVALUES=17] Boil,Pressure
READ    Boil,Pressure
        194.5 20.79  194.3 20.79  197.9 22.40  198.4 22.67  199.4 23.15
        199.9 23.35  200.9 23.89  201.1 23.99  201.4 24.02  201.3 24.01
        203.6 25.14  204.6 26.57  209.5 28.49  208.6 27.76  210.7 29.04
        211.9 29.88  212.2 30.06 :
CALCULATE LogPressure = 100*LOG10(Pressure)
MODEL   LogPressure
FIT     Boil
CAPTION '1. Plot composite of four displays of the standardized residuals.'
RCHECK
CAPTION !t('2. Plot simple residuals against boiling point,',\ 
        'and display a Normal plot of simple residuals.')
RCHECK  [RMETHOD=simple; INDEX=Boil] Y=2(residual); XMETHOD=index,Normal
CAPTION !t('3. Display a half-Normal plot with a generated envelope,',\ 
        'that has been smoothed, and display as a shaded area;',\ 
        'change colours to give dark blue points on cyan background.')
PEN     3,4; COLOUR='blue','aqua'
RCHECK  [ENVELOPE=smooth; SHADE=yes] Y=residual; XMETHOD=Normal
CAPTION '4. Print deletion residual, Cook''s statistic and leverage.'
VARIATE [VALUES=1...17] observe; DECIMALS=0
RCHECK  [PRINT=index,residual,leverage,cook; RMETHOD=deletion;\ 
        INDEX=observe; GRAPHICS=*]
Updated on March 6, 2019

Was this article helpful?