1. Home
  2. RKEEP directive

RKEEP directive

Stores results from a linear, generalized linear, generalized additive or nonlinear model.

Options

EXPAND = string token Whether to put estimates in the order defined by the maximal model for linear or generalized linear models (yes, no); default no
DISPERSION = scalar Dispersion parameter to be used as estimate for variability in s.e.s; default as set in the MODEL directive
RMETHOD = string token Type of residuals to form if parameter RESIDUALS is set (deviance, Pearson, simple); default as set in MODEL
DMETHOD = string token Basis of estimate of dispersion, if not fixed by DISPERSION option (deviance, Pearson); default as set in MODEL
PROBABILITY = scalar Probability level for confidence limits; default 0.95
OMODEL = pointer Pointer to settings of options of the current MODEL statement, given unit labels corresponding to the option names of MODEL (starting with 'distribution')
PMODEL = pointer Pointer to settings of parameters of the current MODEL statement, given unit labels corresponding to the parameter names of MODEL (starting with 'y'), only refers to the first setting of Y, FITTEDVALUES and RESIDUAL
STATISTICS = variates Saves all the statistics that could be displayed for the first Y variate by the 'summary' setting of the PRINT option of the fitting directives FIT, ADD etc
CIMETHOD = string token Method to use to calculate confidence intervals for nonlinear models (exact, quadratic); default quad
IGNOREFAILURE = string Whether to ignore failure to fit a generalized linear model (yes, no); default no
MAXIMALMODEL = formula structure Saves the maximal model (as defined by TERMS)
FITMODEL = formula structure Saves the currently-fitted model (including any contrast functions)
FITCONSTANT = scalar Saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise
FITTYPE = scalar Saves a scalar to indicate the type of model that has been fitted
SAVE = identifier Specifies save structure of model; default * i.e. that from latest model fitted

Parameters

Y = variates Response variates for which results are to be saved; default is the list of response variates in the most recent MODEL statement
RESIDUALS = variates Residuals for each Y variate, as specified by the RMETHOD option
FITTEDVALUES = variates Fitted values for each Y variate
LEVERAGES = variate Leverages of the units for each Y variate
ESTIMATES = variates Estimates of parameters for each Y variate
SE = variates Standard errors of the estimates
INVERSE = symmetric matrix Inverse matrix from a linear or generalized linear model, inverse of second derivative matrix from a nonlinear model
VCOVARIANCE = symmetric matrix Variance-covariance matrix of the estimates
DEVIANCE = scalars Residual ss or deviance
DF = scalar Residual degrees of freedom
TERMS = pointer or formula structure Fitted terms (excluding constant)
ITERATIVEWEIGHTS = variate Iterative weights from a generalized linear model
LINEARPREDICTOR = variate Linear predictor from a generalized linear model
YADJUSTED = variate Adjusted response of a generalized linear model
EXIT = scalar Exit status from a generalized linear or nonlinear model
GRADIENTS = pointer Derivatives of fitted values with respect to parameters in a nonlinear model
GRID = variate Grid of function or deviance values from a nonlinear model
DESIGNMATRIX = matrix Design matrix whose columns are explanatory variates and dummy variates
PEARSONCHISQUARE = scalar Pearson chi-square statistic from a generalized linear model
STERMS = pointer Saves the identifiers of the variates that have been smoothed in the current model
SCOMPONENTS = pointer Saves a pointer to variates holding the nonlinear components of the variates that have been smoothed
NOBSERVATIONS = scalar Number of units used in regression, excluding missing data and zero weights and taking account of restrictions
SEFITTEDVALUES = variate Saves standard errors of the fitted values
SELINEARPREDICTOR = variate Saves standard errors of the linear predictor
INFLATION = variate Saves the variance inflation factors of the parameter estimates
UPPER = variates Saves upper confidence limits for the parameter estimates
LOWER = variates Saves lower confidence limits for the parameter estimates
MEANDEVIANCE = scalars Saves the residual mean deviance (or mean square)
TDEVIANCE = scalars Saves the total deviance (or sum of squares)
TDF = scalars Saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives)
TMEANDEVIANCE = scalars Saves the total mean deviance (or mean square)
SUMMARY = pointer Saves the summary analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc)
ACCUMULATED = pointer Saves the accumulated analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc)
STATISTICS = variates Saves all the statistics that could be displayed for the Y variate by the 'summary' setting of the PRINT option of the fitting directives FIT, ADD etc

Description

RKEEP allows you to copy information from a regression analysis (performed, for example, by a FIT, FITCURVE or FITNONLINEAR statement) into Genstat data structures. You do not need to declare the structures in advance; Genstat will declare them automatically to be of the correct type and length.

The Y parameter specifies the response variates for which the results are to be saved. Unusually for the first parameter of a directive, this has a default: if you leave it out, Genstat assumes that results are to be saved for all the response variates, as given in the previous MODEL statement.

The RESIDUALS, FITTEDVALUES, LEVERAGES, SEFITTEDVALUES and SELINEARPREDICTOR parameters allow you to save the standardized residuals, the fitted values, the leverages, the standard errors of the fitted values and the standard errors of the linear predictor. For example, RESIDUALS=R puts the residuals in a variate R. The RMETHOD option controls the type of residuals that are formed. You cannot save these values if you have set RMETHOD=* in the MODEL statement. The standard errors of fitted values are defined by:

s.e. = √(leverage × variance function × dispersion / weight)

where the variance function is calculated from the fitted value according to the setting of the DISTRIBUTION option of the current MODEL statement, and the dispersion is the fixed or estimated value of dispersion, as controlled by the DISPERSION and DMETHOD options of the MODEL and RKEEP directives.

The ESTIMATES and SE parameters save the parameter estimates and their standard errors; RKEEP puts them in variates, using the same order as in the display produced by the PRINT option of the directive used to fit the model. Alternatively, if you have used TERMS to define a maximal model, you can set option EXPAND=yes to reorder the estimates to their order in the maximal model (including missing values for the parameters not currently in the model). The variates saving these values are set up with labels; thus, you can refer to individual values in expressions using the labels as displayed when the estimates are printed. For example, to get the estimate of the constant into a scalar, you could put:

RKEEP ESTIMATES=Esti

SCALAR Const

CALCULATE Const = Esti$['Constant']

The UPPER and LOWER parameters allow you to save upper and lower confidence limits for the parameter estimates. The probability for the confidence interval is specifed by the PROBABILITY option, with default 0.95. The CIMETHOD option controls the method used with nonlinear models. The default setting, quadratic, uses the same method as for other types of regression, basing the limits on a quadratic surface fitted to the likelihood surface around the optimum. These may be poor approximations if the surface is very non symmetric. The alternative setting, exact, calculates the limits directly from the likelihood surface.

The INFLATION parameter allows the variance inflation factors of the parameters to be saved.

The INVERSE parameter allows you to save the inverse matrix as a symmetric matrix: that is, (XX)-1 where X is the design matrix. This matrix is the same for all response variates.

The VCOVARIANCE parameter saves the variance-covariance matrix of the estimates for each response variate: these are formed by multiplying the inverse matrix by the relevant variance estimate based on the estimated dispersion, or on the dispersion that you have supplied.

The DEVIANCE parameter allows you to save the residual sum of squares, or the deviance for distributions other than Normal. The DF parameter saves the residual degrees of freedom, and the MEANDEVIANCE parameter saves the residual mean deviance. The TDEVIANCE parameter saves the total deviance, the TDF parameter saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives), and the TMEANDEVIANCE parameter saves the total mean deviance.

The LINEARPREDICTOR parameter allows you to save the linear predictor of a generalized linear model; the values of the linear predictor are the same as the fitted values if the link function is the identity function.

The ITERATIVEWEIGHTS parameter saves a variate containing the iterative weights used in the last cycle of the iteration for fitting a generalized linear model. The iterative weights do not contain any contribution from the weights that can be specified, whether or not the model is iterative, by the WEIGHTS option of the MODEL directive, and they are 1.0 for ordinary linear regression.

The YADJUSTED parameter saves the adjusted response variate used in the last cycle of the iteration for fitting a generalized linear model; with the identity link function this is the same as the response variate.

The Pearson chi-square statistic can be saved using the PEARSONCHI parameter of RKEEP. It is calculated as the sum of the squared Pearson residuals. This can be used as an alternative to the deviance for testing goodness of fit; see Nelder & McCullagh (1989).

The EXIT parameter of RKEEP provides a code that indicates the success or type of failure of an iterative fit. Codes 0-7 are relevant to standard curves and general nonlinear models, and codes 0 and 8-13 are for generalized linear models:

0        Successful fitting
1        Limit on number of cycles has been reached without convergence
2        Parameter out of bounds
3        Likelihood appears constant
4        Failure to progress towards solution
5        Some standard errors are not available because the information matrix is nearly singular
6        Calculated likelihood may be incorrect because of missing fitted values
7        Curve is close to a limiting form
8        Data incompatible with model
9        Predicted mean or linear predictor out of range
10      Invalid calculation for calculated link or distribution
11      All units have been excluded from the analysis
12      Iterative process has diverged
13      Failure due to lack of space or data access
14      Function returned a missing value

With a generalized linear model, unless you set option IGNOREFAILURE=yes, the EXIT code is the only information that you can save if the fit has been unsuccessful. Alternatively, with a nonlinear model or when IGNOREFAILURE=yes, RKEEP will save any information that may be available. (You may thus, for example, be able to discover more about the cause of the failure.)

The derivatives of the fitted values with respect to each parameter in a standard curve or general nonlinear model can be stored in variates using the GRADIENTS parameter. You can use these quantities to assess the relative influence of each observation on a parameter; you can also construct a measure of leverage by summing the gradients for all the parameters.

The GRID parameter can be used to store a grid of values of the deviance (or any general function) following FITNONLINEAR.

The DESIGNMATRIX parameter allows you to save the matrix X. The columns correspond to the parameters of the model, ordered as for the ESTIMATES parameter. For simple linear regression with a constant this has only two columns, the first containing ones and the second containing the values of the explanatory variate. Columns corresponding to aliased parameters are omitted, but you can use the corresponding option of TERMS to construct the full design matrix.

The PEARSONCHI parameter provides the Pearson chi-square statistic for dispersion, which is the same as the residual sum of squares for the Normal distribution, but is different to the deviance for other distributions. The STERMS and SCOMPONENTS parameters are relevant to generalized additive models. The STERMS parameter can be used to store a pointer to those variates whose effects in the model are smoothed. The SCOMPONENTS parameter stores a pointer to variates, one for each smoothed variate in the same order as in STERMS, containing the fitted nonlinear component of each smoothed variate – this does not include the linear component or the constant term.

The NOBSERVATIONS parameter allows you to save the number of units used in the analysis, omitting units with missing values or excluded by restrictions. This will be the same as the total number of degrees of freedom plus one, except in a regression with no constant term and no explanatory factors when it will equal the total number of degrees of freedom.

The SUMMARY parameter can be used to save the summary analysis-of-variance (or deviance) table for each response variate. The summary table is saved as a pointer with a variate or text for each of its columns (source, d.f. etc). Similarly, the ACCUMULATED parameter can save the accumulated analysis-of-variance (or deviance) tables.

The STATISTICS parameter saves all the statistics that could be displayed for each response variate by the 'summary' setting of the PRINT option of the fitting directives FIT, ADD etc. Alternatively, the STATISTICS option can be used to save the statistics for the first response variate specified by the MODEL statement.

The DISPERSION option allows you to define the value to be used for the dispersion parameter when calculating the standard errors. The DMETHOD option indicates how this should be calculated if DISPERSION is not set. By default the deviance is used but you can set DMETHOD=Pearson to request the Pearson chi-square statistic to be used instead.

Options OMODEL and PMODEL allow you to save pointers containing information about the current model. The labels of the pointers can be specified in either lower or upper case, or any mixture. OMODEL can be set to a pointer to store information about each of the options set in the previous MODEL statement. For example, the statement

RKEEP [OMODEL=Om]

will allow you to refer to the current variate of weights (if one was set in the WEIGHTS option of MODEL) as Om['weights']. Whether or not a variate was set, the statement

MODEL [WEIGHTS=Om['weights']] Newobs

will allow a new analysis with the same weighting as the old.

The pointer Om has 16 values, with suffixes corresponding to the options of MODEL in the defined order. Similarly, the statement

RKEEP [PMODEL=Pm]

will set up a pointer storing the (eight) current parameter settings of the previous MODEL statement. However, if there was more than one response variate, the first value of the pointer will be the identifier of the first response variate only: the others are not stored. Similarly, only the fitted-values and residuals variates for the first response will be pointed at. For example, the identifier Pm[1] or Pm['y'] can be used to refer to the current response variate after the RKEEP statement above.

The MAXIMALMODEL option saves the maximal model (as defined by TERMS). The FITMODEL option saves the model that has currently been fitted, including any contrast functions (i.e. POL, REG, COMPARISON, SSPLINE or LOESS). The FITCONSTANT option saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise. The FITTYPE option saves a scalar to indicate the type of model that has been fitted: 1 for an ordinary regression or generalized linear model (FIT), 2 for a generalized nonlinear model (FIT with the CALCULATION option set), 3 for a standard curve (FITCURVE) and 4 for a nonlinear model (FITNONLINEAR).

Options: EXPAND, DISPERSION, RMETHOD, DMETHOD, PROBABILITY, OMODEL, PMODEL, STATISTICS, CIMETHOD, IGNOREFAILURE, MAXIMALMODEL, FITMODEL, FITCONSTANT, FITTYPE, SAVE.

Parameters: Y, RESIDUALS, FITTEDVALUES, LEVERAGES, ESTIMATES, SE, INVERSE, VCOVARIANCE, DEVIANCE, DF, TERMS, ITERATIVEWEIGHTS, LINEARPREDICTOR, YADJUSTED, EXIT, GRADIENTS, GRID, DESIGNMATRIX, PEARSONCHISQUARE, STERMS, SCOMPONENTS, NOBSERVATIONS, SEFITTEDVALUES, SELINEARPREDICTOR, INFLATION, UPPER, LOWER, MEANDEVIANCE, TDEVIANCE, TDF, TMEANDEVIANCE, SUMMARY, ACCUMULATED, STATISTICS.

Reference

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman and Hall, London.

See also

Directives: FIT, FITCURVE, FITNONLINEAR, RKESTIMATES.
Procedure: RKLOESSGROUPS.
Commands for: Regression analysis.

Example

" Example FIT-3: Comparing linear regressions between groups
 
  Experiments on cauliflowers in 1957 and 1958 provided data on
  the mean number of florets in the plant and the temperature during
  the growing season (expressed as accumulated temperature above 0 deg C."

" The counts and temperatures are in a file called 'FIT-3.DAT'"
FILEREAD [NAME='%gendir%/examples/FIT-3.DAT'] MnCount,AccTemp
" The first 7 values are from 1957 and the rest from 1958;
  set up a factor to distinguish the two years."
FACTOR [LEVELS=!(1957,1958); VALUES=7(1957,1958)] Year

" Fit a linear regression model of the mean count of florets on
  accumulated temperature - first ignoring the division into two years."
MODEL MnCount
TERMS AccTemp*Year
FIT AccTemp

" Fit parallel regressions for the two years."
ADD Year

" Fit separate regressions for the two years."
ADD AccTemp.Year

" Display the accumulated summary: an analysis of parallelism."
RDISPLAY [PRINT=accumulated]

" Show the parallel models."
DROP [PRINT=*] AccTemp.Year
RGRAPH [GRAPHICS=high]

" Extract the parameter estimates and s.e.s
  and display the common slope and its s.e."
RKEEP ESTIMATES=Esti; SE=Se
CALC Slope,SlopeSE = (Esti,Se)$[2]
PRINT Slope,SlopeSE
Updated on February 7, 2023

Was this article helpful?