VALLSUBSETS procedure

Fits all subsets of the fixed terms in a REML analysis (R.W. Payne).

Options

`PRINT` = string tokens	Controls printed output (`results`); default `resu`
`FORCED` = formula	Terms to include in every model
`FACTORIAL` = scalar	Limit for expansion of `FORCED` terms; default 3
`SELECTION` = string tokens	One or two criteria to be printed with the models (`r2`, `adjusted`, `cp`, `ep`, `aic`, `sic`, `bic`, `rss`, `rms`); default `aic`, `sic`
`NBESTMODELS` = scalar	Number of models to print; default * i.e. all
`BESTMODEL` = pointer	Saves the best model according to the selected criteria
`RESULTS` = pointer	Pointer to save variates containing the criteria for the sets, and F and Wald statistics for the terms that they contain
`MARGINALTERMS` = string token	How to treat terms that are marginal to other terms (`forced`, `free`); default `forc`
`SAVE` = REML save structure	Specifies the analysis whose fixed terms are to be tested; by default this will be the most recent `REML`

No parameters

Description

VALLSUBSETS fits all subsets of the fixed terms in a REML analysis. It does this by a generalized regression analysis, with a weight matrix based on the variances estimated from the REML analysis (i.e. with the full fixed model). The subsets are thus assessed using identical estimates of the variance components, allowing statistics such as the Akaike information criterion to be used to assess which subset may be best.

By default, VALLSUBSETS uses the most recent REML analysis. However, you can take an earlier analysis, by using the SAVE option of VALLSUBSETS to specify its save structure (saved using the SAVE parameter of the earlier REML command).

The subsets are formed from all the fixed terms, but you can use the FORCED option to specify terms that should always be included. Terms that are marginal to another fixed term are usually also treated as forced. However, you can set option MARGINALTERMS to free to retain them in the “free” terms that are used to form the subsets. Note that VALLSUBSETS considers only models that obey the principle of marginality. This states that a model that includes an interaction term must also include all its marginal terms. For example, a model that includes the interaction A.B must also include the main effects A and B.

The SELECTION option selects one or two criteria to be printed with the sets, with the settings:
r2 % sum of squares accounted for (taking the total sum of squares as the residual from the forced model),
adjusted % variance accounted for (compared to the residual mean square from the forced model),
cp Mallows Cp,
ep mean squared error of prediction,
aic Akaike information criterion,
sic or bic Schwarz (Bayesian) information criterion,
rss residual sum of squares, and
rms residual mean square.

For more details, see the RSEARCH procedure (which is used to do the analyses). VALLSUBSETS reports which subset is best, according to each of the selected criteria. The default selects the Akaike and Schwarz (Bayesian) information criteria.

In addition to the selected criteria, the output shows the number of degrees of freedom fitted in the subset, and probabilities assessing the effect of dropping each of its terms from the subset. The probabilities are obtained from F statistics if the denominator degrees of freedom are available from the original REML analysis. Otherwise they are based on Wald statistics. Terms that are marginal to another term in the subset cannot be dropped. This is indicated by printing marg instead of a probability. Also, terms that are aliased are indicated by printing aaa. By default, all the subsets are printed, but you can set the NBESTMODELS to a scalar, n say, to print only the n best subsets according to the first criterion specified by the SELECTION option.

The results are printed by default. However, you can set option PRINT=* if you want only to save them, using the RESULTS option. This saves a pointer containing variates storing all the available criteria and the numbers of degrees of freedom, then the Wald statistics for the terms, followed by their probabilities, and then the F statistics and their probabilities.

You can also use the BESTMODEL option to save the best model according to each of the selected criteria. It saves them in a pointer containing either one or two model formulae (according to the number of selected criteria). The formulae are stored in the order in which the criteria were specified by the SELECTION option, and are labelled in the pointer by the names of the criteria.

Options: PRINT, FORCED, FACTORIAL, SELECTION, NBESTMODELS, BESTMODELS, RESULTS, MARGINALTERMS, SAVE.
Parameters: none.

Method

VALLSUBSETS defines a weighted regression, with weight matrix given by the inverse of the unit-by-unit variance-covariance matrix (obtained using the UVCOVARIANCE option of VKEEP). It then calls the RSEARCH procedure to fit the subsets.

Action with `RESTRICT`

Any restriction applied to vectors used in the REML analysis will apply also to the results from VALLSUBSETS.

Example

CAPTION 'VALLSUBSETS example','Guide Part 2, Example 5.3.6a'; STYLE=meta,plain
FACTOR  [NVALUES=322; LEVELS=27] Dam
&       [LEVELS=18] Pup
FACTOR  [LEVELS=2; LABELS=!T('M','F')] Sex
FACTOR  [LEVELS=3; LABELS=!T('C','Low','High')] Dose
VARIATE [NVALUES=322] Littersize,Weight
OPEN    '%GENDIR%/Examples/GuidePart2/Rats.dat'; CHANNEL=2
READ    [CHANNEL=2] Dose,Sex,Littersize,Dam,Pup,Weight; \
        FREPRESENTATION=2(labels),4(levels)
CLOSE   2
VCOMPONENTS [FIXED=Littersize+Dose*Sex] RANDOM=Dam/Pup
REML        Weight
VALLSUBSETS [MARGINALTERMS=free]

Updated on October 29, 2020

Was this article helpful?

Yes No