Performs screening tests for generalized or multivariate linear models (H. van der Voet).
Options
PRINT = string tokens |
Printed output required (model , pool , starscheme , tests , pvalues ); default mode , pool , star |
---|---|
CONSTANT = string token |
How to treat the constant (estimate , omit ); default esti |
FACTORIAL = scalar |
Limit for expansion of model terms; default 3 |
NOMESSAGE = string tokens |
Which warning messages to suppress when fitting the complete model (aliasing , marginality ): warning messages are always suppressed when fitting models for individual tests; default * |
EXCLUDEHIGHER = string token |
Whether to exclude higher-order interactions in the conditional regression model for each tested term (yes , no ); default no |
FORCED = formula |
Terms always included in the model (no tests on these terms); default * |
TESTED = text |
To save the names of individual terms which are tested |
NELEMENTS = variate |
To save the number of identifiers composing each individual term |
MARGINAL = pointer |
To save results from marginal tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability |
CONDITIONAL = pointer |
To save results from conditional tests for each tested term in a pointer containing the test statistic, corresponding degrees of freedom and the calculated probability |
MVINCLUDE = string token |
Whether to include units with missing values in non-relevant explanatory variates or factors when calculating conditional and marginal tests (yes , no ); default no |
Parameter
FREE = formula |
List of explanatory variates and factors, or model formula; each term from the expanded FREE formula is tested in a marginal and in a conditional test, unless the term is also part of the FORCED formula |
---|
Description
RSCREEN
calculates marginal and conditional tests for all terms in a (multivariate) linear or generalized linear model. For multivariate linear regression models these tests are based on Wilks’ Lambda. RSCREEN
also performs pooled testing of all main effects, of all 2-factor interactions, etc.
A call to RSCREEN
must be preceded by a MODEL
statement which defines the response variate(s) and, if required, a vector of weights, an offset and other aspects of a generalized linear model. More than one response variable is allowed for ordinary linear models, in which case multivariate linear regression models are fitted and tests are based on Rao’s F approximation of Wilks’ Lambda. If there is one response variable, tests are based on (scaled) deviances or deviance ratios, according to the setting of the DISPERSION
option in the MODEL
directive. Deviance ratios are always based on the mean deviance of the full model.
The FREE
parameter specifies the model terms which have to be tested. The limit for expanding the FREE
model formula can be set with the FACTORIAL
option with default value 3. Two tests are performed for each term in the expanded model formula:
1. a marginal test: the term is added to the simplest possible model. For example, the main effect of A
is added to the null model and the interaction term A.B
is added to a model containing only main effects A
and B
.
2. a conditional test: the term is added to the most complex possible model containing no terms involving the term which is tested. For example, interaction A.B
is added to the model with all terms except those involving A.B
, like for example the interaction A.B.C
. Note that e.g. the interaction C.D.E
will be included in the model when testing A.B
. The inclusion of any higher-order term can be prevented by setting option EXCLUDEHIGHER=yes
.
It is sometimes desirable to include specific terms in every model. Such terms may be specified by means of the FORCED
option. The FORCED
model formula is fitted first and no test results are given for the FORCED
terms. The CONSTANT
option controls whether the constant parameter is included in the model.
By default any units with missing values in any of the explanatory variates or factors will be excluded from all of the tests. However, if you have many missing values that spread unevenly over the explanatory variables, there may be few units with non-missing values for every variable. If you have only a single y-variate, you may then want to set option MVINCLUDE=explanatory
. RSCREEN
will then use all the available units when constructing each marginal or conditional test. So it ignores missing values in any explanatory variable that is not involved in the test. This provides more information for each test, but the tables of tests should be interpreted with care as different tests may be based on different sets of units.
The PRINT
option controls output. The model
setting gives a description of the model. The pool
setting prints an accumulated analysis of variance or deviance in which terms with the same number of identifiers, e.g. main effects or two-factor interactions, are pooled. PRINT=tests
prints both marginal and conditional test statistics, while setting pvalues
prints (approximate) P-values from chi-square or F-tests. Finally, PRINT=starscheme
prints significance of P-values by a conventional star notation. The default setting of PRINT
is model
, pool
, starscheme
.
Output can be saved by means of options TESTED
, NELEMENTS
, MARGINAL
and CONDITIONAL
. TESTED
saves the individual model terms in a text structure, while NELEMENTS
saves the number of identifiers composing each individual term. MARGINAL
and CONDITIONAL
save test results in a pointer which contains four variates. These variates save the test statistic, the corresponding degrees of freedom for numerator and denominator and the calculated (approximate) probability. For chi-square tests the degrees of freedom for the denominator are set to missing. For multivariate linear regression models, Rao’s F-statistic and the corresponding degrees of freedom are saved. Note that, when MVINCLUDE=no
, units with one or more missing values in any term are excluded from the analysis. This implies that FIT
used for a subset of terms may give different results than RSCREEN
.
All regression warnings are suppressed, except when fitting the full model. This is to prevent the printing of long lists of similar warnings like “Iterative weights have become 0, or have been held at a limit”.
If RSCREEN
is used for log-linear models, with the option EXCLUDEHIGHER
set to yes
, the marginal and conditional tests are equal to the marginal and partial tests of Brown (1976), which are available e.g. in BMDP. RSCREEN
can also be used to implement the model selection strategy used in GLIMPSE, as described in McCullagh & Nelder (1989), pages 91-93. However, RSCREEN
does not use approximations for models that require an iterative fitting process.
Options: PRINT
, CONSTANT
, FACTORIAL
, NOMESSAGE
, EXCLUDEHIGHER
, FORCED
, TESTED
, NELEMENTS
, MARGINAL
, CONDITIONAL
, MVINCLUDE
.
Parameter: FREE
.
Method
Most of the implementation is straightforward. The null model for the marginal test for term t
is constructed as #FORCED + ((#FREE - #FORCED) -* c[]) - #t
, where c[]
is the classifying set of factors and variates comprising #FREE - #FORCED
excluding factors and variates in term t
. The null model for the conditional test is #FORCED + #FREE -* #t
.
When the DISPERSION
option of the MODEL
directive is set to *
, terms are tested by means of F statistics, which are deviance ratios based on the mean deviance of the full model. For a fixed dispersion parameter chi-square statistics are used, i.e. deviance differences scaled by the dispersion parameter. Terms in multivariate linear models are tested by Rao’s F-approximation for Wilks’ Lambda (Rao 1973). These are always based on residual variation calculated for the full model.
Smoothing splines are not allowed in the FREE
formula due to a limitation of the FCLASSIFICATION
directive.
Action with RESTRICT
Any restriction applied to vectors used in the regression model applies also to the results from RSCREEN
.
References
Brown, M.B. (1976). Screening effects in multidimensional contingency tables. Applied Statistics, 25, 37-46.
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
Rao, C.R. (1973). Linear Statistical Inference and its Applications. Wiley, New York.
See also
Procedures: ASCREEN
, RSEARCH
, RWALD
, VSCREEN
.
Commands for: Regression analysis.
Example
CAPTION 'RSCREEN example',\ !t('Detergent data from Goodman (1971, Technometrics 13, 33-61),',\ 'used by Brown (1976, Applied Statistics, 25, 37-46).');\ STYLE=meta,plain VARIATE [VALUES=19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66,\ 47, 55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42] response FACTOR [NVALUES=24; LABELS=!T(soft,medium,hard)] softness FACTOR [NVALUES=24; LABELS=!T(X,M)] preference FACTOR [NVALUES=24; LABELS=!T(yes,no)] prevuserM FACTOR [NVALUES=24; LABELS=!T(high,low)] temperature GENERATE softness, preference, prevuserM, temperature MODEL [DISTRIBUTION=poisson] response RSCREEN [FACTORIAL=3; EXCLUDEHIGHER=yes]\ softness * preference * prevuserM * temperature