1. Home
  2. RSCREEN procedure

RSCREEN procedure

Performs screening tests for generalized or multivariate linear models (H. van der Voet).

Options

PRINT = string tokens Printed output required (model, pool, starscheme, tests, pvalues); default mode, pool, star
CONSTANT = string token How to treat the constant (estimate, omit); default esti
FACTORIAL = scalar Limit for expansion of model terms; default 3
NOMESSAGE = string tokens Which warning messages to suppress when fitting the complete model (aliasing, marginality): warning messages are always suppressed when fitting models for individual tests; default *
EXCLUDEHIGHER = string token Whether to exclude higher-order interactions in the conditional regression model for each tested term (yes, no); default no
FORCED = formula Terms always included in the model (no tests on these terms); default *
TESTED = text To save the names of individual terms which are tested
NELEMENTS = variate To save the number of identifiers composing each individual term
MARGINAL = pointer To save results from marginal tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability
CONDITIONAL = pointer To save results from conditional tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability
MVINCLUDE = string token Whether to include units with missing values in non-relevant explanatory variates or factors when calculating conditional and marginal tests (yes, no); default no

Parameter

FREE = formula List of explanatory variates and factors, or model formula; each term from the expanded FREE formula is tested in a marginal and in a conditional test, unless the term is also part of the FORCED formula

Description

RSCREEN calculates marginal and conditional tests for all terms in a (multivariate) linear or generalized linear model. For multivariate linear regression models these tests are based on Wilks’ Lambda. RSCREEN also performs pooled testing of all main effects, of all 2-factor interactions, etc.

A call to RSCREEN must be preceded by a MODEL statement which defines the response variate(s) and, if required, a vector of weights, an offset and other aspects of a generalized linear model. More than one response variable is allowed for ordinary linear models, in which case multivariate linear regression models are fitted and tests are based on Rao’s F approximation of Wilks’ Lambda. If there is one response variable, tests are based on (scaled) deviances or deviance ratios, according to the setting of the DISPERSION option in the MODEL directive. Deviance ratios are always based on the mean deviance of the full model.

The FREE parameter specifies the model terms which have to be tested. The limit for expanding the FREE model formula can be set with the FACTORIAL option with default value 3. Two tests are performed for each term in the expanded model formula:

1.   a marginal test: the term is added to the simplest possible model. For example, the main effect of A is added to the null model and the interaction term A.B is added to a model containing only main effects A and B.

2.   a conditional test: the term is added to the most complex possible model containing no terms involving the term which is tested. For example, interaction A.B is added to the model with all terms except those involving A.B, like for example the interaction A.B.C. Note that e.g. the interaction C.D.E will be included in the model when testing A.B. The inclusion of any higher-order term can be prevented by setting option EXCLUDEHIGHER=yes.

It is sometimes desirable to include specific terms in every model. Such terms may be specified by means of the FORCED option. The FORCED model formula is fitted first and no test results are given for the FORCED terms. The CONSTANT option controls whether the constant parameter is included in the model.

By default any units with missing values in any of the explanatory variates or factors will be excluded from all of the tests. However, if you have many missing values that spread unevenly over the explanatory variables, there may be few units with non-missing values for every variable. If you have only a single y-variate, you may then want to set option MVINCLUDE=explanatory. RSCREEN will then use all the available units when constructing each marginal or conditional test. So it ignores missing values in any explanatory variable that is not involved in the test. This provides more information for each test, but the tables of tests should be interpreted with care as different tests may be based on different sets of units.

The PRINT option controls output. The model setting gives a description of the model. The pool setting prints an accumulated analysis of variance or deviance in which terms with the same number of identifiers, e.g. main effects or two-factor interactions, are pooled. PRINT=tests prints both marginal and conditional test statistics, while setting pvalues prints (approximate) P-values from chi-square or F-tests. Finally, PRINT=starscheme prints significance of P-values by a conventional star notation. The default setting of PRINT is model, pool, starscheme.

Output can be saved by means of options TESTED, NELEMENTS, MARGINAL and CONDITIONAL. TESTED saves the individual model terms in a text structure, while NELEMENTS saves the number of identifiers composing each individual term. MARGINAL and CONDITIONAL save test results in a pointer which contains four variates. These variates save the test statistic, the corresponding degrees of freedom for numerator and denominator and the calculated (approximate) probability. For chi-square tests the degrees of freedom for the denominator are set to missing. For multivariate linear regression models, Rao’s F-statistic and the corresponding degrees of freedom are saved. Note that, when MVINCLUDE=no, units with one or more missing values in any term are excluded from the analysis. This implies that FIT used for a subset of terms may give different results than RSCREEN.

All regression warnings are suppressed, except when fitting the full model. This is to prevent the printing of long lists of similar warnings like “Iterative weights have become 0, or have been held at a limit”.

If RSCREEN is used for log-linear models, with the option EXCLUDEHIGHER set to yes, the marginal and conditional tests are equal to the marginal and partial tests of Brown (1976), which are available e.g. in BMDP. RSCREEN can also be used to implement the model selection strategy used in GLIMPSE, as described in McCullagh & Nelder (1989), pages 91-93. However, RSCREEN does not use approximations for models that require an iterative fitting process.

Options: PRINT, CONSTANT, FACTORIAL, NOMESSAGE, EXCLUDEHIGHER, FORCED, TESTED, NELEMENTS, MARGINAL, CONDITIONAL, MVINCLUDE.
Parameter: FREE.

Method

Most of the implementation is straightforward. The null model for the marginal test for term t is constructed as #FORCED + ((#FREE - #FORCED) -* c[]) - #t, where c[] is the classifying set of factors and variates comprising #FREE - #FORCED excluding factors and variates in term t. The null model for the conditional test is #FORCED + #FREE -* #t.

When the DISPERSION option of the MODEL directive is set to *, terms are tested by means of F statistics, which are deviance ratios based on the mean deviance of the full model. For a fixed dispersion parameter chi-square statistics are used, i.e. deviance differences scaled by the dispersion parameter. Terms in multivariate linear models are tested by Rao’s F-approximation for Wilks’ Lambda (Rao 1973). These are always based on residual variation calculated for the full model.

Smoothing splines are not allowed in the FREE formula due to a limitation of the FCLASSIFICATION directive.

Action with RESTRICT

Any restriction applied to vectors used in the regression model applies also to the results from RSCREEN.

References

Brown, M.B. (1976). Screening effects in multidimensional contingency tables. Applied Statistics, 25, 37-46.

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.

Rao, C.R. (1973). Linear Statistical Inference and its Applications. Wiley, New York.

See also

Procedures: ASCREEN, RSEARCH, RWALD, VSCREEN.
Commands for: Regression analysis.

Example

CAPTION  'RSCREEN example',\ 
         !t('Detergent data from Goodman (1971, Technometrics 13, 33-61),',\ 
         'used by Brown (1976, Applied Statistics, 25, 37-46).');\ 
         STYLE=meta,plain
VARIATE  [VALUES=19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66,\ 
         47, 55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42] response
FACTOR   [NVALUES=24; LABELS=!T(soft,medium,hard)] softness
FACTOR   [NVALUES=24; LABELS=!T(X,M)] preference
FACTOR   [NVALUES=24; LABELS=!T(yes,no)] prevuserM
FACTOR   [NVALUES=24; LABELS=!T(high,low)] temperature
GENERATE softness, preference, prevuserM, temperature
MODEL    [DISTRIBUTION=poisson] response
RSCREEN  [FACTORIAL=3; EXCLUDEHIGHER=yes]\ 
         softness * preference * prevuserM * temperature
Updated on September 4, 2019

Was this article helpful?