Specifies a maximal model, containing all terms to be used in subsequent linear, generalized linear, generalized additive and nonlinear models.
Options
PRINT = string tokens |
What to print (correlations , wmeans , SSPM , monitoring ); default * |
---|---|
FACTORIAL = scalar |
Limit for expansion of model terms; default 3 |
FULL = string token |
Whether to assign all possible parameters to factors and interactions (yes, no ); default no |
SSPM = SSPM |
Gives sums of squares and products on which to base calculations; default * |
TOLERANCE = scalar |
Criterion for testing for linear dependence; default is 107ε, where ε is the smallest real value such that 1+ε is greater than 1 on the computer |
DESIGNMATRIX = matrix |
Saves the design matrix for the maximal model |
MVINCLUDE = string token |
Whether to include units with missing values in the explanatory factors and variates (explanatory ); default * i.e. omit these |
RIDGE = scalar or variate |
Supplies values to add to the diagonal of the sums-of-squares-and-products matrix, to enable ridge methods to be used; default 0 |
CLDESIGNMATRIX = text |
Saves the column labels of the design matrix for the maximal model i.e. the names of the parameters estimated in the maximal model |
CLSSP = text |
Saves the labels of the sum-of-squares-and-products matrix |
Parameter
formula | List of explanatory variates and factors, or model formula |
---|
Description
You can use the TERMS
directive before starting to explore different subsets of explanatory variables, to allow Genstat to define a common set of units for the regression and to carry out some initial calculations. TERMS
thus initializes Genstat ready for an exploration using the directives ADD
, DROP
, SWITCH
, TRY
or STEP
. It overrules any model that has already been fitted with FIT
, FITCURVE
or FITNONLINEAR
and resets the current model to be the null model containing only the constant term.
TERMS
need not be specified before exploring a linear, generalized linear or generalized additive model, that is one that is fitted initially using FIT
with its CALCULATION
option unset. However, it is essential before exploring a nonlinear model, that is one fitted initially by FIT
with CALCULATION
set, or by FITCURVE
or FITNONLINEAR
. Furthermore, if some of the explanatory variables to be used in a linear, generalized linear or generalized aditive model contain missing values or have restrictions, the use of TERMS
ensures that the sequence of models are fitted using a common set of units. Otherwise, if a variate or factor which is introduced into the model has a missing value where previous explanatory variates or factors did not, or is restricted whereas previous ones were not, the set of units has to be changed. The previous model is automatically refitted with the new set of units before the new model is fitted, but the accumulated summary will then show only these two fits.
The formula specified by the parameter of TERMS
should contain all the explanatory variables and model terms that you may wish to use in the subsets. The model containing all the terms specified in the formula, excluding the response variates, is called the maximal model.
The calculations are weighted if you have specified weights in the MODEL
statement, and they are made within groups if you have specified a grouping factor. All units of the variates are used unless there are restrictions or missing values. Genstat will look for restrictions on response variates, explanatory variates, the weight variate, the offset variate and the grouping factor (but these must not be restricted in different ways). A missing value in any of these structures except a response variate will also exclude the corresponding unit.
The PRINT
option controls printed output, with settings:
SSPM |
sums of squares and products between the variates in the model (including the response variates and dummy variates set up to represent any factors and their interactions), the means of the variates and the degrees of freedom; |
---|---|
correlation |
the matrix of correlations between variables; |
wmeans |
group means for a within-group regression; |
monitoring |
monitoring information from the fit of the null model. |
The FACTORIAL
option controls the inclusion of interaction terms in the model. All terms involving more than the specified number of factors and variates are omitted. By default FACTORIAL
is set to three. The FULL
option can be set to yes
to ensure that Genstat allocates a parameter to every level of each factor in a linear, generalized linear or generalized additive model; otherwise it will exclude the reference level of the factor (and its estimates will represent the differences between the estimated parameters for the other levels and the estimate for the reference level).
The SSPM
option lets you use values that you have already calculated for an SSPM
structure. This is feasible only with ordinary linear regression but it can be useful when you are analysing very large sets of data: you can accumulate an SSPM sequentially with the FSSPM
directive to avoid storing all the data at one time. Later regression calculations will be based on the supplied values of the SSPM, though no fitted values, residuals or leverages will be available. However, the values of a supplied SSPM are accepted without checking by the TERMS
directive: Genstat simply assumes you are giving it something sensible.
The TOLERANCE
option controls the detection of aliasing in subsequent model fitting. By default, a parameter in a linear or generalized linear model will be deemed to be aliased if the ratio between the original diagonal value of the SSPM corresponding to this parameter and the current diagonal value of the partially inverted SSPM is less than 107ε. The quantity ε depends on the computer and is defined to be the smallest number such that the computer recognizes 1.0 + ε as greater than 1.0 in double precision. Any positive value can be supplied by the TOLERANCE
option to replace this default criterion in subsequent linear regression and generalized linear regression.
The DESIGNMATRIX
option can be set to a matrix to save the design matrix corresponding to the maximal model. The CLDESIGNMATRIX
option can save the column labels of the design matrix without saving the design matrix itself. (These are the names of the parameters estimated in the maximal model.)
The MVINCLUDE
option allows units with missing values with missing values in factors or variates in the model to be included (by default these are excluded). Where this occurs, the factor or variate is taken to make no contribution to the fitted value for the unit concerned. This is an option that should be set only under very special circumstances, for example it is required internally by some of the procedures that fit hierarchical generalized linear models (see HGANALYSE
). It should not be used during ordinary analyses.
The RIDGE
option enables ridge methods to be implemented. It can be set to a scalar, to define a constant to add to all the diagonal elements of the sums-of-squares-and-products matrix that correspond to the parameters in the model. Alternatively you can set RIDGE
to a variate, to add a different value to each diagonal element. You may then want to use the CLSSP
option to save the row labels of the sum-of-squares-and-products matrix, so that you see which rows correspond to model parameters, and which ones correspond to the y-variates. By default nothing is added (i.e. RIDGE
= 0).
Options: PRINT
, FACTORIAL
, FULL
, SSPM
, TOLERANCE
, DESIGNMATRIX
, MVINCLUDE
, RIDGE
, CLDESIGNMATRIX
, CLSSP
.
Parameter: unnamed.
Action with RESTRICT
You can restrict the units that Genstat will use for the regression by putting a restriction on any of the vectors involved in the MODEL
statement (response variates, weight variate, offset variate, grouping factor or variate of binomial totals), or on any explanatory variate or factor in the TERMS
statement. However, you are not allowed to have different restrictions on the different vectors. You should not alter the restriction applied to the vectors between the TERMS
statement and subsequent fitting statements.
See also
Directives: FIT
, FITCURVE
, FITNONLINEAR
, MODEL
.
Commands for: Regression analysis.
Example
" Example FIT-2: Multiple linear regression Relate the monthly water usage (thousand gallons) of a production plant to four variables: 1. Average monthly temperature (degrees F) 2. Amount of production (billion pounds) 3. Number of plant operating days in the month 4. Number of people employed (Data from Draper and Smith, Regression Analysis (1981) p353.)" " The data from 17 months are in a file called 'FIT-2.DAT' and names for the data columns are on the first line" FILEREAD [NAME='%gendir%/examples/FIT-2.DAT'; IMETHOD=read] FGROUP=no " Specify that the amount of water used is to be the response variable, and print the correlation matrix of all the variables. The TERMS directive also allows use of the directives ADD, DROP and so on, to compare alternative sets of explanatory variables." MODEL Water TERMS [PRINT=correlations] Temp,Product,Opdays,Employ " Fit a linear regression of water usage on amount of production, since this variable is most highly correlated with water usage (0.631)." FIT Product " Water use increases by 80 gallons (s.e. 25) for each extra billion pounds of production - ignoring the effect of other variables." " Regress water usage on all the explanatory variables, to take account of the smaller effects." ADD Temp,Opdays,Employ " All the variables have a significant effect on water usage (all the t statistics are large). The effect of increasing production by a billion pounds while keeping the other variables constant is to increase water usage by 212 gallons (s.e. 46)." " The first month is particularly influential. Display all the fitted values, residuals and leverages (influence). " RDISPLAY [PRINT=fitted] " Display the relationship between water usage and production, adjusting for the other effects" RGRAPH [GRAPHICS=high] Product " Plot the residuals against the fitted values to see if there is any indication of non-constant variance" RCHECK [GRAPHICS=high] residual; fitted