Performs screening tests for generalized or multivariate linear models (H. van der Voet).
|Printed output required (
||How to treat the constant (
||Limit for expansion of model terms; default 3|
||Which warning messages to suppress when fitting the complete model (
||Whether to exclude higher-order interactions in the conditional regression model for each tested term (
||Terms always included in the model (no tests on these terms); default
||To save the names of individual terms which are tested|
||To save the number of identifiers composing each individual term|
||To save results from marginal tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability|
||To save results from conditional tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability|
||Whether to include units with missing values in non-relevant explanatory variates or factors when calculating conditional and marginal tests (
||List of explanatory variates and factors, or model formula; each term from the expanded
RSCREEN calculates marginal and conditional tests for all terms in a (multivariate) linear or generalized linear model. For multivariate linear regression models these tests are based on Wilks’ Lambda.
RSCREEN also performs pooled testing of all main effects, of all 2-factor interactions, etc.
A call to
RSCREEN must be preceded by a
MODEL statement which defines the response variate(s) and, if required, a vector of weights, an offset and other aspects of a generalized linear model. More than one response variable is allowed for ordinary linear models, in which case multivariate linear regression models are fitted and tests are based on Rao’s F approximation of Wilks’ Lambda. If there is one response variable, tests are based on (scaled) deviances or deviance ratios, according to the setting of the
DISPERSION option in the
MODEL directive. Deviance ratios are always based on the mean deviance of the full model.
FREE parameter specifies the model terms which have to be tested. The limit for expanding the
FREE model formula can be set with the
FACTORIAL option with default value 3. Two tests are performed for each term in the expanded model formula:
1. a marginal test: the term is added to the simplest possible model. For example, the main effect of
A is added to the null model and the interaction term
A.B is added to a model containing only main effects
2. a conditional test: the term is added to the most complex possible model containing no terms involving the term which is tested. For example, interaction
A.B is added to the model with all terms except those involving
A.B, like for example the interaction
A.B.C. Note that e.g. the interaction
C.D.E will be included in the model when testing
A.B. The inclusion of any higher-order term can be prevented by setting option
It is sometimes desirable to include specific terms in every model. Such terms may be specified by means of the
FORCED option. The
FORCED model formula is fitted first and no test results are given for the
FORCED terms. The
CONSTANT option controls whether the constant parameter is included in the model.
By default any units with missing values in any of the explanatory variates or factors will be excluded from all of the tests. However, if you have many missing values that spread unevenly over the explanatory variables, there may be few units with non-missing values for every variable. If you have only a single y-variate, you may then want to set option
RSCREEN will then use all the available units when constructing each marginal or conditional test. So it ignores missing values in any explanatory variable that is not involved in the test. This provides more information for each test, but the tables of tests should be interpreted with care as different tests may be based on different sets of units.
model setting gives a description of the model. The
pool setting prints an accumulated analysis of variance or deviance in which terms with the same number of identifiers, e.g. main effects or two-factor interactions, are pooled.
PRINT=tests prints both marginal and conditional test statistics, while setting
pvalues prints (approximate) P-values from chi-square or F-tests. Finally,
PRINT=starscheme prints significance of P-values by a conventional star notation. The default setting of
Output can be saved by means of options
TESTED saves the individual model terms in a text structure, while
NELEMENTS saves the number of identifiers composing each individual term.
CONDITIONAL save test results in a pointer which contains four variates. These variates save the test statistic, the corresponding degrees of freedom for numerator and denominator and the calculated (approximate) probability. For chi-square tests the degrees of freedom for the denominator are set to missing. For multivariate linear regression models, Rao’s F-statistic and the corresponding degrees of freedom are saved. Note that, when
MVINCLUDE=no, units with one or more missing values in any term are excluded from the analysis. This implies that
FIT used for a subset of terms may give different results than
All regression warnings are suppressed, except when fitting the full model. This is to prevent the printing of long lists of similar warnings like “Iterative weights have become 0, or have been held at a limit”.
RSCREEN is used for log-linear models, with the option
EXCLUDEHIGHER set to
yes, the marginal and conditional tests are equal to the marginal and partial tests of Brown (1976), which are available e.g. in BMDP.
RSCREEN can also be used to implement the model selection strategy used in GLIMPSE, as described in McCullagh & Nelder (1989), pages 91-93. However,
RSCREEN does not use approximations for models that require an iterative fitting process.
Most of the implementation is straightforward. The null model for the marginal test for term
t is constructed as
#FORCED + ((#FREE - #FORCED) -* c) - #t, where
c is the classifying set of factors and variates comprising
#FREE - #FORCED excluding factors and variates in term
t. The null model for the conditional test is
#FORCED + #FREE -* #t.
DISPERSION option of the
MODEL directive is set to
*, terms are tested by means of F statistics, which are deviance ratios based on the mean deviance of the full model. For a fixed dispersion parameter chi-square statistics are used, i.e. deviance differences scaled by the dispersion parameter. Terms in multivariate linear models are tested by Rao’s F-approximation for Wilks’ Lambda (Rao 1973). These are always based on residual variation calculated for the full model.
Smoothing splines are not allowed in the
FREE formula due to a limitation of the
Any restriction applied to vectors used in the regression model applies also to the results from
Brown, M.B. (1976). Screening effects in multidimensional contingency tables. Applied Statistics, 25, 37-46.
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
Rao, C.R. (1973). Linear Statistical Inference and its Applications. Wiley, New York.
CAPTION 'RSCREEN example',\ !t('Detergent data from Goodman (1971, Technometrics 13, 33-61),',\ 'used by Brown (1976, Applied Statistics, 25, 37-46).');\ STYLE=meta,plain VARIATE [VALUES=19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66,\ 47, 55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42] response FACTOR [NVALUES=24; LABELS=!T(soft,medium,hard)] softness FACTOR [NVALUES=24; LABELS=!T(X,M)] preference FACTOR [NVALUES=24; LABELS=!T(yes,no)] prevuserM FACTOR [NVALUES=24; LABELS=!T(high,low)] temperature GENERATE softness, preference, prevuserM, temperature MODEL [DISTRIBUTION=poisson] response RSCREEN [FACTORIAL=3; EXCLUDEHIGHER=yes]\ softness * preference * prevuserM * temperature