TERMS directive

Specifies a maximal model, containing all terms to be used in subsequent linear, generalized linear, generalized additive and nonlinear models.

Options

`PRINT` = string tokens	What to print (`correlations`, `wmeans`, `SSPM`, `monitoring`); default `*`
`FACTORIAL` = scalar	Limit for expansion of model terms; default 3
`FULL` = string token	Whether to assign all possible parameters to factors and interactions (`yes, no`); default `no`
`SSPM` = SSPM	Gives sums of squares and products on which to base calculations; default `*`
`TOLERANCE` = scalar	Criterion for testing for linear dependence; default is 10⁷ε, where ε is the smallest real value such that 1+ε is greater than 1 on the computer
`DESIGNMATRIX` = matrix	Saves the design matrix for the maximal model
`MVINCLUDE` = string token	Whether to include units with missing values in the explanatory factors and variates (`explanatory`); default `*` i.e. omit these
`RIDGE` = scalar or variate	Supplies values to add to the diagonal of the sums-of-squares-and-products matrix, to enable ridge methods to be used; default 0
`CLDESIGNMATRIX` = text	Saves the column labels of the design matrix for the maximal model i.e. the names of the parameters estimated in the maximal model
`CLSSP` = text	Saves the labels of the sum-of-squares-and-products matrix

Parameter

formula	List of explanatory variates and factors, or model formula

Description

You can use the TERMS directive before starting to explore different subsets of explanatory variables, to allow Genstat to define a common set of units for the regression and to carry out some initial calculations. TERMS thus initializes Genstat ready for an exploration using the directives ADD, DROP, SWITCH, TRY or STEP. It overrules any model that has already been fitted with FIT, FITCURVE or FITNONLINEAR and resets the current model to be the null model containing only the constant term.

TERMS need not be specified before exploring a linear, generalized linear or generalized additive model, that is one that is fitted initially using FIT with its CALCULATION option unset. However, it is essential before exploring a nonlinear model, that is one fitted initially by FIT with CALCULATION set, or by FITCURVE or FITNONLINEAR. Furthermore, if some of the explanatory variables to be used in a linear, generalized linear or generalized aditive model contain missing values or have restrictions, the use of TERMS ensures that the sequence of models are fitted using a common set of units. Otherwise, if a variate or factor which is introduced into the model has a missing value where previous explanatory variates or factors did not, or is restricted whereas previous ones were not, the set of units has to be changed. The previous model is automatically refitted with the new set of units before the new model is fitted, but the accumulated summary will then show only these two fits.

The formula specified by the parameter of TERMS should contain all the explanatory variables and model terms that you may wish to use in the subsets. The model containing all the terms specified in the formula, excluding the response variates, is called the maximal model.

The calculations are weighted if you have specified weights in the MODEL statement, and they are made within groups if you have specified a grouping factor. All units of the variates are used unless there are restrictions or missing values. Genstat will look for restrictions on response variates, explanatory variates, the weight variate, the offset variate and the grouping factor (but these must not be restricted in different ways). A missing value in any of these structures except a response variate will also exclude the corresponding unit.

The PRINT option controls printed output, with settings:

`SSPM`	sums of squares and products between the variates in the model (including the response variates and dummy variates set up to represent any factors and their interactions), the means of the variates and the degrees of freedom;
`correlation`	the matrix of correlations between variables;
`wmeans`	group means for a within-group regression;
`monitoring`	monitoring information from the fit of the null model.

The FACTORIAL option controls the inclusion of interaction terms in the model. All terms involving more than the specified number of factors and variates are omitted. By default FACTORIAL is set to three. The FULL option can be set to yes to ensure that Genstat allocates a parameter to every level of each factor in a linear, generalized linear or generalized additive model; otherwise it will exclude the reference level of the factor (and its estimates will represent the differences between the estimated parameters for the other levels and the estimate for the reference level).

The SSPM option lets you use values that you have already calculated for an SSPM structure. This is feasible only with ordinary linear regression but it can be useful when you are analysing very large sets of data: you can accumulate an SSPM sequentially with the FSSPM directive to avoid storing all the data at one time. Later regression calculations will be based on the supplied values of the SSPM, though no fitted values, residuals or leverages will be available. However, the values of a supplied SSPM are accepted without checking by the TERMS directive: Genstat simply assumes you are giving it something sensible.

The TOLERANCE option controls the detection of aliasing in subsequent model fitting. By default, a parameter in a linear or generalized linear model will be deemed to be aliased if the ratio between the original diagonal value of the SSPM corresponding to this parameter and the current diagonal value of the partially inverted SSPM is less than 10⁷ε. The quantity ε depends on the computer and is defined to be the smallest number such that the computer recognizes 1.0 + ε as greater than 1.0 in double precision. Any positive value can be supplied by the TOLERANCE option to replace this default criterion in subsequent linear regression and generalized linear regression.

The DESIGNMATRIX option can be set to a matrix to save the design matrix corresponding to the maximal model. The CLDESIGNMATRIX option can save the column labels of the design matrix without saving the design matrix itself. (These are the names of the parameters estimated in the maximal model.)

The MVINCLUDE option allows units with missing values with missing values in factors or variates in the model to be included (by default these are excluded). Where this occurs, the factor or variate is taken to make no contribution to the fitted value for the unit concerned. This is an option that should be set only under very special circumstances, for example it is required internally by some of the procedures that fit hierarchical generalized linear models (see HGANALYSE). It should not be used during ordinary analyses.

The RIDGE option enables ridge methods to be implemented. It can be set to a scalar, to define a constant to add to all the diagonal elements of the sums-of-squares-and-products matrix that correspond to the parameters in the model. Alternatively you can set RIDGE to a variate, to add a different value to each diagonal element. You may then want to use the CLSSP option to save the row labels of the sum-of-squares-and-products matrix, so that you see which rows correspond to model parameters, and which ones correspond to the y-variates. By default nothing is added (i.e. RIDGE = 0).

Options: PRINT, FACTORIAL, FULL, SSPM, TOLERANCE, DESIGNMATRIX, MVINCLUDE, RIDGE, CLDESIGNMATRIX, CLSSP.

Parameter: unnamed.

Action with `RESTRICT`

You can restrict the units that Genstat will use for the regression by putting a restriction on any of the vectors involved in the MODEL statement (response variates, weight variate, offset variate, grouping factor or variate of binomial totals), or on any explanatory variate or factor in the TERMS statement. However, you are not allowed to have different restrictions on the different vectors. You should not alter the restriction applied to the vectors between the TERMS statement and subsequent fitting statements.

Example

" Example FIT-2: Multiple linear regression

  Relate the monthly water usage (thousand gallons) of a production
  plant to four variables:
      1. Average monthly temperature (degrees F)
      2. Amount of production (billion pounds)
      3. Number of plant operating days in the month
      4. Number of people employed
  (Data from Draper and Smith, Regression Analysis (1981) p353.)"

" The data from 17 months are in a file called 'FIT-2.DAT'
  and names for the data columns are on the first line"
FILEREAD [NAME='%gendir%/examples/FIT-2.DAT'; IMETHOD=read] FGROUP=no

" Specify that the amount of water used is to be the response
  variable, and print the correlation matrix of all the variables.
  The TERMS directive also allows use of the directives ADD, DROP
  and so on, to compare alternative sets of explanatory variables."
MODEL Water
TERMS [PRINT=correlations] Temp,Product,Opdays,Employ

" Fit a linear regression of water usage on amount of production, since
  this variable is most highly correlated with water usage (0.631)."
FIT Product

" Water use increases by 80 gallons (s.e. 25) for each extra billion
  pounds of production - ignoring the effect of other variables."

" Regress water usage on all the explanatory variables, to take account
  of the smaller effects."
ADD Temp,Opdays,Employ

" All the variables have a significant effect on water usage (all the
  t statistics are large).  The effect of increasing production by
  a billion pounds while keeping the other variables constant is to
  increase water usage by 212 gallons (s.e. 46)."

" The first month is particularly influential.
  Display all the fitted values, residuals and leverages (influence). "
RDISPLAY [PRINT=fitted]

" Display the relationship between water usage and production,
  adjusting for the other effects"
RGRAPH [GRAPHICS=high] Product

" Plot the residuals against the fitted values to see if there is any
  indication of non-constant variance"
RCHECK [GRAPHICS=high] residual; fitted

Updated on March 5, 2019

Was this article helpful?

Yes No