Performs screening tests for generalized or multivariate linear models (H. van der Voet).

### Options

`PRINT` = string tokens |
Printed output required (`model` , `pool` , `starscheme` , `tests` , `pvalues` ); default `mode` , `pool` , `star` |
---|---|

`CONSTANT` = string token |
How to treat the constant (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit for expansion of model terms; default 3 |

`NOMESSAGE` = string tokens |
Which warning messages to suppress when fitting the complete model (`aliasing` , `marginality` ): warning messages are always suppressed when fitting models for individual tests; default `*` |

`EXCLUDEHIGHER` = string token |
Whether to exclude higher-order interactions in the conditional regression model for each tested term (`yes` , `no` ); default `no` |

`FORCED` = formula |
Terms always included in the model (no tests on these terms); default `*` |

`TESTED` = text |
To save the names of individual terms which are tested |

`NELEMENTS` = variate |
To save the number of identifiers composing each individual term |

`MARGINAL` = pointer |
To save results from marginal tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability |

`CONDITIONAL` = pointer |
To save results from conditional tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability |

`MVINCLUDE` = string token |
Whether to include units with missing values in non-relevant explanatory variates or factors when calculating conditional and marginal tests (`yes` , `no` ); default `no` |

### Parameter

`FREE` = formula |
List of explanatory variates and factors, or model formula; each term from the expanded `FREE` formula is tested in a marginal and in a conditional test, unless the term is also part of the `FORCED` formula |
---|

### Description

`RSCREEN`

calculates marginal and conditional tests for all terms in a (multivariate) linear or generalized linear model. For multivariate linear regression models these tests are based on Wilks’ Lambda. `RSCREEN`

also performs pooled testing of all main effects, of all 2-factor interactions, etc.

A call to `RSCREEN`

must be preceded by a `MODEL`

statement which defines the response variate(s) and, if required, a vector of weights, an offset and other aspects of a generalized linear model. More than one response variable is allowed for ordinary linear models, in which case multivariate linear regression models are fitted and tests are based on Rao’s F approximation of Wilks’ Lambda. If there is one response variable, tests are based on (scaled) deviances or deviance ratios, according to the setting of the `DISPERSION`

option in the `MODEL`

directive. Deviance ratios are always based on the mean deviance of the full model.

The `FREE`

parameter specifies the model terms which have to be tested. The limit for expanding the `FREE`

model formula can be set with the `FACTORIAL`

option with default value 3. Two tests are performed for each term in the expanded model formula:

1. a marginal test: the term is added to the simplest possible model. For example, the main effect of `A`

is added to the null model and the interaction term `A.B`

is added to a model containing only main effects `A`

and `B`

.

2. a conditional test: the term is added to the most complex possible model containing no terms involving the term which is tested. For example, interaction `A.B`

is added to the model with all terms except those involving `A.B`

, like for example the interaction `A.B.C`

. Note that e.g. the interaction `C.D.E`

will be included in the model when testing `A.B`

. The inclusion of any higher-order term can be prevented by setting option `EXCLUDEHIGHER=yes`

.

It is sometimes desirable to include specific terms in every model. Such terms may be specified by means of the `FORCED`

option. The `FORCED`

model formula is fitted first and no test results are given for the `FORCED`

terms. The `CONSTANT`

option controls whether the constant parameter is included in the model.

By default any units with missing values in any of the explanatory variates or factors will be excluded from all of the tests. However, if you have many missing values that spread unevenly over the explanatory variables, there may be few units with non-missing values for every variable. If you have only a single y-variate, you may then want to set option `MVINCLUDE=explanatory`

. `RSCREEN`

will then use all the available units when constructing each marginal or conditional test. So it ignores missing values in any explanatory variable that is not involved in the test. This provides more information for each test, but the tables of tests should be interpreted with care as different tests may be based on different sets of units.

The `PRINT`

option controls output. The `model`

setting gives a description of the model. The `pool`

setting prints an accumulated analysis of variance or deviance in which terms with the same number of identifiers, e.g. main effects or two-factor interactions, are pooled. `PRINT=tests`

prints both marginal and conditional test statistics, while setting `pvalues`

prints (approximate) P-values from chi-square or F-tests. Finally, `PRINT=starscheme`

prints significance of P-values by a conventional star notation. The default setting of `PRINT`

is `model`

, `pool`

, `starscheme`

.

Output can be saved by means of options `TESTED`

, `NELEMENTS`

, `MARGINAL`

and `CONDITIONAL`

. `TESTED`

saves the individual model terms in a text structure, while `NELEMENTS`

saves the number of identifiers composing each individual term. `MARGINAL`

and `CONDITIONAL`

save test results in a pointer which contains four variates. These variates save the test statistic, the corresponding degrees of freedom for numerator and denominator and the calculated (approximate) probability. For chi-square tests the degrees of freedom for the denominator are set to missing. For multivariate linear regression models, Rao’s F-statistic and the corresponding degrees of freedom are saved. Note that, when `MVINCLUDE=no`

, units with one or more missing values in any term are excluded from the analysis. This implies that `FIT`

used for a subset of terms may give different results than `RSCREEN`

.

All regression warnings are suppressed, except when fitting the full model. This is to prevent the printing of long lists of similar warnings like “Iterative weights have become 0, or have been held at a limit”.

If `RSCREEN`

is used for log-linear models, with the option `EXCLUDEHIGHER`

set to `yes`

, the marginal and conditional tests are equal to the marginal and partial tests of Brown (1976), which are available e.g. in BMDP. `RSCREEN`

can also be used to implement the model selection strategy used in GLIMPSE, as described in McCullagh & Nelder (1989), pages 91-93. However, `RSCREEN`

does not use approximations for models that require an iterative fitting process.

Options: `PRINT`

, `CONSTANT`

, `FACTORIAL`

, `NOMESSAGE`

, `EXCLUDEHIGHER`

, `FORCED`

, `TESTED`

, `NELEMENTS`

, `MARGINAL`

, `CONDITIONAL`

, `MVINCLUDE`

.

Parameter: `FREE`

.

### Method

Most of the implementation is straightforward. The null model for the marginal test for term `t`

is constructed as `#FORCED + ((#FREE - #FORCED) -* c[]) - #t`

, where `c[]`

is the classifying set of factors and variates comprising `#FREE - #FORCED`

excluding factors and variates in term `t`

. The null model for the conditional test is `#FORCED + #FREE -* #t`

.

When the `DISPERSION`

option of the `MODEL`

directive is set to `*`

, terms are tested by means of F statistics, which are deviance ratios based on the mean deviance of the full model. For a fixed dispersion parameter chi-square statistics are used, i.e. deviance differences scaled by the dispersion parameter. Terms in multivariate linear models are tested by Rao’s F-approximation for Wilks’ Lambda (Rao 1973). These are always based on residual variation calculated for the full model.

Smoothing splines are not allowed in the `FREE`

formula due to a limitation of the `FCLASSIFICATION`

directive.

### Action with `RESTRICT`

Any restriction applied to vectors used in the regression model applies also to the results from `RSCREEN`

.

### References

Brown, M.B. (1976). Screening effects in multidimensional contingency tables. *Applied Statistics*, 25, 37-46.

McCullagh, P. & Nelder, J.A. (1989). *Generalized Linear Models (second edition)*. Chapman & Hall, London.

Rao, C.R. (1973). *Linear Statistical Inference and its Applications*. Wiley, New York.

### See also

Procedures: `ASCREEN`

, `RSEARCH`

, `RWALD`

, `VSCREEN`

.

Commands for: Regression analysis.

### Example

CAPTION 'RSCREEN example',\ !t('Detergent data from Goodman (1971, Technometrics 13, 33-61),',\ 'used by Brown (1976, Applied Statistics, 25, 37-46).');\ STYLE=meta,plain VARIATE [VALUES=19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66,\ 47, 55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42] response FACTOR [NVALUES=24; LABELS=!T(soft,medium,hard)] softness FACTOR [NVALUES=24; LABELS=!T(X,M)] preference FACTOR [NVALUES=24; LABELS=!T(yes,no)] prevuserM FACTOR [NVALUES=24; LABELS=!T(high,low)] temperature GENERATE softness, preference, prevuserM, temperature MODEL [DISTRIBUTION=poisson] response RSCREEN [FACTORIAL=3; EXCLUDEHIGHER=yes]\ softness * preference * prevuserM * temperature