Stores results from a linear, generalized linear, generalized additive or nonlinear model.
Options
EXPAND = string token |
Whether to put estimates in the order defined by the maximal model for linear or generalized linear models (yes , no ); default no |
---|---|
DISPERSION = scalar |
Dispersion parameter to be used as estimate for variability in s.e.s; default as set in the MODEL directive |
RMETHOD = string token |
Type of residuals to form if parameter RESIDUALS is set (deviance , Pearson , simple ); default as set in MODEL |
DMETHOD = string token |
Basis of estimate of dispersion, if not fixed by DISPERSION option (deviance, Pearson ); default as set in MODEL |
PROBABILITY = scalar |
Probability level for confidence limits; default 0.95 |
OMODEL = pointer |
Pointer to settings of options of the current MODEL statement, given unit labels corresponding to the option names of MODEL (starting with 'distribution' ) |
PMODEL = pointer |
Pointer to settings of parameters of the current MODEL statement, given unit labels corresponding to the parameter names of MODEL (starting with 'y' ), only refers to the first setting of Y , FITTEDVALUES and RESIDUAL |
STATISTICS = variates |
Saves all the statistics that could be displayed for the first Y variate by the 'summary' setting of the PRINT option of the fitting directives FIT , ADD etc |
CIMETHOD = string token |
Method to use to calculate confidence intervals for nonlinear models (exact , quadratic ); default quad |
IGNOREFAILURE = string |
Whether to ignore failure to fit a generalized linear model (yes , no ); default no |
MAXIMALMODEL = formula structure |
Saves the maximal model (as defined by TERMS ) |
FITMODEL = formula structure |
Saves the currently-fitted model (including any contrast functions) |
FITCONSTANT = scalar |
Saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise |
FITTYPE = scalar |
Saves a scalar to indicate the type of model that has been fitted |
SAVE = identifier |
Specifies save structure of model; default * i.e. that from latest model fitted |
Parameters
Y = variates |
Response variates for which results are to be saved; default is the list of response variates in the most recent MODEL statement |
---|---|
RESIDUALS = variates |
Residuals for each Y variate, as specified by the RMETHOD option |
FITTEDVALUES = variates |
Fitted values for each Y variate |
LEVERAGES = variate |
Leverages of the units for each Y variate |
ESTIMATES = variates |
Estimates of parameters for each Y variate |
SE = variates |
Standard errors of the estimates |
INVERSE = symmetric matrix |
Inverse matrix from a linear or generalized linear model, inverse of second derivative matrix from a nonlinear model |
VCOVARIANCE = symmetric matrix |
Variance-covariance matrix of the estimates |
DEVIANCE = scalars |
Residual ss or deviance |
DF = scalar |
Residual degrees of freedom |
TERMS = pointer or formula structure |
Fitted terms (excluding constant) |
ITERATIVEWEIGHTS = variate |
Iterative weights from a generalized linear model |
LINEARPREDICTOR = variate |
Linear predictor from a generalized linear model |
YADJUSTED = variate |
Adjusted response of a generalized linear model |
EXIT = scalar |
Exit status from a generalized linear or nonlinear model |
GRADIENTS = pointer |
Derivatives of fitted values with respect to parameters in a nonlinear model |
GRID = variate |
Grid of function or deviance values from a nonlinear model |
DESIGNMATRIX = matrix |
Design matrix whose columns are explanatory variates and dummy variates |
PEARSONCHISQUARE = scalar |
Pearson chi-square statistic from a generalized linear model |
STERMS = pointer |
Saves the identifiers of the variates that have been smoothed in the current model |
SCOMPONENTS = pointer |
Saves a pointer to variates holding the nonlinear components of the variates that have been smoothed |
NOBSERVATIONS = scalar |
Number of units used in regression, excluding missing data and zero weights and taking account of restrictions |
SEFITTEDVALUES = variate |
Saves standard errors of the fitted values |
SELINEARPREDICTOR = variate |
Saves standard errors of the linear predictor |
INFLATION = variate |
Saves the variance inflation factors of the parameter estimates |
UPPER = variates |
Saves upper confidence limits for the parameter estimates |
LOWER = variates |
Saves lower confidence limits for the parameter estimates |
MEANDEVIANCE = scalars |
Saves the residual mean deviance (or mean square) |
TDEVIANCE = scalars |
Saves the total deviance (or sum of squares) |
TDF = scalars |
Saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives) |
TMEANDEVIANCE = scalars |
Saves the total mean deviance (or mean square) |
SUMMARY = pointer |
Saves the summary analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc) |
ACCUMULATED = pointer |
Saves the accumulated analysis-of-variance (or deviance) table as a pointer with a variate or text for each column (source, d.f. etc) |
STATISTICS = variates |
Saves all the statistics that could be displayed for the Y variate by the 'summary' setting of the PRINT option of the fitting directives FIT , ADD etc |
Description
RKEEP
allows you to copy information from a regression analysis (performed, for example, by a FIT
, FITCURVE
or FITNONLINEAR
statement) into Genstat data structures. You do not need to declare the structures in advance; Genstat will declare them automatically to be of the correct type and length.
The Y
parameter specifies the response variates for which the results are to be saved. Unusually for the first parameter of a directive, this has a default: if you leave it out, Genstat assumes that results are to be saved for all the response variates, as given in the previous MODEL
statement.
The RESIDUALS
, FITTEDVALUES
, LEVERAGES
, SEFITTEDVALUES
and SELINEARPREDICTOR
parameters allow you to save the standardized residuals, the fitted values, the leverages, the standard errors of the fitted values and the standard errors of the linear predictor. For example, RESIDUALS=R
puts the residuals in a variate R
. The RMETHOD
option controls the type of residuals that are formed. You cannot save these values if you have set RMETHOD=*
in the MODEL
statement. The standard errors of fitted values are defined by:
s.e. = √(leverage × variance function × dispersion / weight)
where the variance function is calculated from the fitted value according to the setting of the DISTRIBUTION
option of the current MODEL
statement, and the dispersion is the fixed or estimated value of dispersion, as controlled by the DISPERSION
and DMETHOD
options of the MODEL
and RKEEP
directives.
The ESTIMATES
and SE
parameters save the parameter estimates and their standard errors; RKEEP
puts them in variates, using the same order as in the display produced by the PRINT
option of the directive used to fit the model. Alternatively, if you have used TERMS
to define a maximal model, you can set option EXPAND=yes
to reorder the estimates to their order in the maximal model (including missing values for the parameters not currently in the model). The variates saving these values are set up with labels; thus, you can refer to individual values in expressions using the labels as displayed when the estimates are printed. For example, to get the estimate of the constant into a scalar, you could put:
RKEEP ESTIMATES=Esti
SCALAR Const
CALCULATE Const = Esti$['Constant']
The UPPER
and LOWER
parameters allow you to save upper and lower confidence limits for the parameter estimates. The probability for the confidence interval is specifed by the PROBABILITY
option, with default 0.95. The CIMETHOD
option controls the method used with nonlinear models. The default setting, quadratic
, uses the same method as for other types of regression, basing the limits on a quadratic surface fitted to the likelihood surface around the optimum. These may be poor approximations if the surface is very non symmetric. The alternative setting, exact
, calculates the limits directly from the likelihood surface.
The INFLATION
parameter allows the variance inflation factors of the parameters to be saved.
The INVERSE
parameter allows you to save the inverse matrix as a symmetric matrix: that is, (X′X)-1 where X is the design matrix. This matrix is the same for all response variates.
The VCOVARIANCE
parameter saves the variance-covariance matrix of the estimates for each response variate: these are formed by multiplying the inverse matrix by the relevant variance estimate based on the estimated dispersion, or on the dispersion that you have supplied.
The DEVIANCE
parameter allows you to save the residual sum of squares, or the deviance for distributions other than Normal. The DF
parameter saves the residual degrees of freedom, and the MEANDEVIANCE
parameter saves the residual mean deviance. The TDEVIANCE
parameter saves the total deviance, the TDF
parameter saves the total degrees of freedom (corrected for the mean or uncorrected as displayed by the fitting directives), and the TMEANDEVIANCE
parameter saves the total mean deviance.
The LINEARPREDICTOR
parameter allows you to save the linear predictor of a generalized linear model; the values of the linear predictor are the same as the fitted values if the link function is the identity function.
The ITERATIVEWEIGHTS
parameter saves a variate containing the iterative weights used in the last cycle of the iteration for fitting a generalized linear model. The iterative weights do not contain any contribution from the weights that can be specified, whether or not the model is iterative, by the WEIGHTS
option of the MODEL
directive, and they are 1.0 for ordinary linear regression.
The YADJUSTED
parameter saves the adjusted response variate used in the last cycle of the iteration for fitting a generalized linear model; with the identity link function this is the same as the response variate.
The Pearson chi-square statistic can be saved using the PEARSONCHI
parameter of RKEEP
. It is calculated as the sum of the squared Pearson residuals. This can be used as an alternative to the deviance for testing goodness of fit; see Nelder & McCullagh (1989).
The EXIT
parameter of RKEEP
provides a code that indicates the success or type of failure of an iterative fit. Codes 0-7 are relevant to standard curves and general nonlinear models, and codes 0 and 8-13 are for generalized linear models:
0 Successful fitting
1 Limit on number of cycles has been reached without convergence
2 Parameter out of bounds
3 Likelihood appears constant
4 Failure to progress towards solution
5 Some standard errors are not available because the information matrix is nearly singular
6 Calculated likelihood may be incorrect because of missing fitted values
7 Curve is close to a limiting form
8 Data incompatible with model
9 Predicted mean or linear predictor out of range
10 Invalid calculation for calculated link or distribution
11 All units have been excluded from the analysis
12 Iterative process has diverged
13 Failure due to lack of space or data access
14 Function returned a missing value
With a generalized linear model, unless you set option IGNOREFAILURE=yes
, the EXIT
code is the only information that you can save if the fit has been unsuccessful. Alternatively, with a nonlinear model or when IGNOREFAILURE=yes
, RKEEP
will save any information that may be available. (You may thus, for example, be able to discover more about the cause of the failure.)
The derivatives of the fitted values with respect to each parameter in a standard curve or general nonlinear model can be stored in variates using the GRADIENTS
parameter. You can use these quantities to assess the relative influence of each observation on a parameter; you can also construct a measure of leverage by summing the gradients for all the parameters.
The GRID
parameter can be used to store a grid of values of the deviance (or any general function) following FITNONLINEAR
.
The DESIGNMATRIX
parameter allows you to save the matrix X. The columns correspond to the parameters of the model, ordered as for the ESTIMATES
parameter. For simple linear regression with a constant this has only two columns, the first containing ones and the second containing the values of the explanatory variate. Columns corresponding to aliased parameters are omitted, but you can use the corresponding option of TERMS
to construct the full design matrix.
The PEARSONCHI
parameter provides the Pearson chi-square statistic for dispersion, which is the same as the residual sum of squares for the Normal distribution, but is different to the deviance for other distributions. The STERMS
and SCOMPONENTS
parameters are relevant to generalized additive models. The STERMS
parameter can be used to store a pointer to those variates whose effects in the model are smoothed. The SCOMPONENTS
parameter stores a pointer to variates, one for each smoothed variate in the same order as in STERMS
, containing the fitted nonlinear component of each smoothed variate – this does not include the linear component or the constant term.
The NOBSERVATIONS
parameter allows you to save the number of units used in the analysis, omitting units with missing values or excluded by restrictions. This will be the same as the total number of degrees of freedom plus one, except in a regression with no constant term and no explanatory factors when it will equal the total number of degrees of freedom.
The SUMMARY
parameter can be used to save the summary analysis-of-variance (or deviance) table for each response variate. The summary table is saved as a pointer with a variate or text for each of its columns (source, d.f. etc). Similarly, the ACCUMULATED
parameter can save the accumulated analysis-of-variance (or deviance) tables.
The STATISTICS
parameter saves all the statistics that could be displayed for each response variate by the 'summary'
setting of the PRINT
option of the fitting directives FIT
, ADD
etc. Alternatively, the STATISTICS
option can be used to save the statistics for the first response variate specified by the MODEL
statement.
The DISPERSION
option allows you to define the value to be used for the dispersion parameter when calculating the standard errors. The DMETHOD
option indicates how this should be calculated if DISPERSION
is not set. By default the deviance is used but you can set DMETHOD=Pearson
to request the Pearson chi-square statistic to be used instead.
Options OMODEL
and PMODEL
allow you to save pointers containing information about the current model. The labels of the pointers can be specified in either lower or upper case, or any mixture. OMODEL
can be set to a pointer to store information about each of the options set in the previous MODEL
statement. For example, the statement
RKEEP [OMODEL=Om]
will allow you to refer to the current variate of weights (if one was set in the WEIGHTS
option of MODEL
) as Om['weights']
. Whether or not a variate was set, the statement
MODEL [WEIGHTS=Om['weights']] Newobs
will allow a new analysis with the same weighting as the old.
The pointer Om
has 16 values, with suffixes corresponding to the options of MODEL
in the defined order. Similarly, the statement
RKEEP [PMODEL=Pm]
will set up a pointer storing the (eight) current parameter settings of the previous MODEL
statement. However, if there was more than one response variate, the first value of the pointer will be the identifier of the first response variate only: the others are not stored. Similarly, only the fitted-values and residuals variates for the first response will be pointed at. For example, the identifier Pm[1]
or Pm['y']
can be used to refer to the current response variate after the RKEEP
statement above.
The MAXIMALMODEL
option saves the maximal model (as defined by TERMS
). The FITMODEL
option saves the model that has currently been fitted, including any contrast functions (i.e. POL
, REG
, COMPARISON
, SSPLINE
or LOESS
). The FITCONSTANT
option saves a scalar containing the value one if the constant is included in the fitted model, or zero otherwise. The FITTYPE
option saves a scalar to indicate the type of model that has been fitted: 1 for an ordinary regression or generalized linear model (FIT
), 2 for a generalized nonlinear model (FIT
with the CALCULATION
option set), 3 for a standard curve (FITCURVE
) and 4 for a nonlinear model (FITNONLINEAR
).
Options: EXPAND
, DISPERSION
, RMETHOD
, DMETHOD
, PROBABILITY
, OMODEL
, PMODEL
, STATISTICS
, CIMETHOD
, IGNOREFAILURE
, MAXIMALMODEL
, FITMODEL
, FITCONSTANT
, FITTYPE
, SAVE
.
Parameters: Y
, RESIDUALS
, FITTEDVALUES
, LEVERAGES
, ESTIMATES
, SE
, INVERSE
, VCOVARIANCE
, DEVIANCE
, DF
, TERMS
, ITERATIVEWEIGHTS
, LINEARPREDICTOR
, YADJUSTED
, EXIT
, GRADIENTS
, GRID
, DESIGNMATRIX
, PEARSONCHISQUARE
, STERMS
, SCOMPONENTS
, NOBSERVATIONS
, SEFITTEDVALUES
, SELINEARPREDICTOR
, INFLATION
, UPPER
, LOWER
, MEANDEVIANCE
, TDEVIANCE
, TDF
, TMEANDEVIANCE
, SUMMARY
, ACCUMULATED
, STATISTICS
.
Reference
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman and Hall, London.
See also
Directives: FIT
, FITCURVE
, FITNONLINEAR
, RKESTIMATES
.
Procedure: RKLOESSGROUPS
.
Commands for: Regression analysis.
Example
" Example FIT-3: Comparing linear regressions between groups Experiments on cauliflowers in 1957 and 1958 provided data on the mean number of florets in the plant and the temperature during the growing season (expressed as accumulated temperature above 0 deg C." " The counts and temperatures are in a file called 'FIT-3.DAT'" FILEREAD [NAME='%gendir%/examples/FIT-3.DAT'] MnCount,AccTemp " The first 7 values are from 1957 and the rest from 1958; set up a factor to distinguish the two years." FACTOR [LEVELS=!(1957,1958); VALUES=7(1957,1958)] Year " Fit a linear regression model of the mean count of florets on accumulated temperature - first ignoring the division into two years." MODEL MnCount TERMS AccTemp*Year FIT AccTemp " Fit parallel regressions for the two years." ADD Year " Fit separate regressions for the two years." ADD AccTemp.Year " Display the accumulated summary: an analysis of parallelism." RDISPLAY [PRINT=accumulated] " Show the parallel models." DROP [PRINT=*] AccTemp.Year RGRAPH [GRAPHICS=high] " Extract the parameter estimates and s.e.s and display the common slope and its s.e." RKEEP ESTIMATES=Esti; SE=Se CALC Slope,SlopeSE = (Esti,Se)$[2] PRINT Slope,SlopeSE