Calculates Wald and F tests for dropping terms from a regression (R.W. Payne).
Options
PRINT = string token |
Controls printed output (waldtests ); default wald |
---|---|
FACTORIAL = scalar |
Limit on number of factors in the model terms generated from the TERMS parameter; default 3 |
Y = variate |
Y-variate from whose analysis to calculate the statistics; default is the last y-variate in SAVE |
RDF = scalar |
Saves the residual d.f. used to calculate F probabilities when the dispersion is not fixed |
SAVE = regression save structure |
Specifies the save structure (from MODEL ) containing the analysis for which to calculate the tests; default is the save structure from the most recent regression |
Parameters
TERMS = formula |
Model terms for which tests are required |
---|---|
WALDSTATISTIC = scalar or pointer to scalars |
Saves Wald statistics |
DF = scalar or pointer to scalars |
Saves d.f. of Wald statistics |
PROBABILITY = scalar or pointer to scalars |
Saves the probabilities for the Wald statistics if the dispersion is fixed, or the corresponding F statistics if it is estimated |
Description
RWALD
provides Wald tests to help you decide whether any terms can be dropped from a regression model. The model must have been fitted already by the regression commands (MODEL
, FIT
etc.) in the usual way. The tests are usually produced for the most recent regression analysis, but you can set the SAVE
and Y
options to request tests from an earlier analysis.
By default, RWALD
produces tests for all the terms that can be dropped from the model: that is, for every term that is not marginal to another term in the model. For example, in the formula
A + B + C + D + A.B + A.D + B.D
the terms C
, A.B
, A.D
and B.D
can be dropped as there are no other terms in the model that contain all their factors (i.e. none to which they are marginal). However, A
cannot be dropped until A.B
and A.D
have been dropped. You can use the TERMS
parameter to request Wald tests for a specific set of terms. A missing value is then given for any term that cannot be dropped. The FACTORIAL
option sets a limit on the number of factors or variates in each term that is formed from the TERMS
formula (default 3).
If option PRINT=waldtests
(the default), RWALD
prints a table with columns containing the Wald statistic, its number of degrees of freedom and a probability value. With an ordinary linear regression, RWALD
will also print an F statistic, and use this to obtain the probability. Provided there is no aliasing between the parameters of the terms, these F statistics and probabilities will be identical to those that would be printed in the Change lines of the Summary of Analysis if the terms were dropped from the model explicitly by using the DROP
or TRY
directives. The advantage of RWALD
is that the model does not have to be refitted (excluding each term) to calculate the information. It thus provides a much more efficient method of assessing the model.
F statistics are also given with any generalized linear model in which the dispersion is not fixed (e.g. models involving the gamma distribution). However, in generalized linear models with a fixed dispersion (e.g. binomial or Poisson), the probabilities are obtained by treating the Wald statistics as chi-square statistics. The deviances and deviance ratios used by TRY
and DROP
are calculated from the likelihoods of the generalized linear models, whereas the Wald and F statistics are essentially based on weighted sums of squares. So probabilities calculated by RWALD
will no longer be identical to those given by TRY
and DROP
. However, both sets of probabilities are based on the asymptotic properties of their statistics, and so they should give similar conclusions.
The WALDSTATISTIC
parameter can save the statistics, and the DF
parameter can save their numbers of degrees of freedom. If you are making a Wald test for a single term, you can supply a scalar for each of these parameters. However, if you have several terms, you must supply a pointer which will then be set up to contain as many scalars as there are terms. Similarly the PROBABILITY
parameter saves the probabilities for the Wald statistics if the dispersion is fixed, or the corresponding F statistics if it is estimated. The number residual degrees of freedom for the F statistics can be saved, in a scalar, by the RDF
option. This contains a missing value if the dispersion is fixed.
Options: PRINT
, FACTORIAL
, Y
, RDF
, SAVE
.
Parameters: TERMS
, WALDSTATISTIC
, DF
, PROBABILITY
.
Method
RWALD
uses FCLASSIFICATION
to form the list of terms that can be dropped. It then calculates the statistics using estimates and variances saved using RKESTIMATES
.
See also
Commands for: Regression analysis.
Example
CAPTION 'RWALD example',\ 'Cloud seeding example; see Guide to Genstat Part 2, Section 3.3.';\ STYLE=meta,plain " Variables are: A Action (NS not seeded, S seeded) D Days after first day of experiment S Suitability for seeding (from model) C Percent cloud cover P Previous rainfall (in 10**7 cubic m) E Type of cloud (1 or 2) Y Subsequent rainfall (in 10**7 cubic m)" FACTOR [LABELS=!t(S,NS)] A FACTOR [LEVELS=2] E READ A,D,S,C,P,E,Y; FREPRESENTATION=labels,4(*),levels,* NS 0 1.75 13.4 0.274 2 12.85 S 1 2.70 37.9 1.267 1 5.52 S 3 4.10 3.9 0.198 2 6.29 NS 4 2.35 5.3 0.526 1 6.11 S 6 4.25 7.1 0.250 1 2.45 NS 9 1.60 6.9 0.018 2 3.61 NS 18 1.30 4.6 0.307 1 0.47 NS 25 3.35 4.9 0.194 1 4.56 NS 27 2.85 12.1 0.751 1 6.35 S 28 2.20 5.2 0.084 1 5.06 S 29 4.40 4.1 0.236 1 2.76 S 32 3.10 2.8 0.214 1 4.05 NS 33 3.95 6.8 0.796 1 5.74 S 35 2.90 3.0 0.124 1 4.84 S 38 2.05 7.0 0.144 1 11.86 NS 39 4.00 11.3 0.398 1 4.45 NS 53 3.35 4.2 0.237 2 3.66 S 55 3.70 3.3 0.960 1 4.22 NS 56 3.80 2.2 0.230 1 1.16 S 59 3.40 6.5 0.142 2 5.45 S 65 3.15 3.1 0.073 1 2.02 NS 68 3.15 2.6 0.136 1 0.82 S 82 4.01 8.3 0.123 1 1.09 NS 83 4.65 7.4 0.168 1 0.28 : CALCULATE Lp,Ly = LOG10(P,Y) MODEL Ly TERMS A*(D+S+C+Lp+E) FIT [PRINT=model,estimates] A + S + D + C + Lp + E + S.A RWALD TRY [PRINT=model,summary; NOMESSAGE=residual,leverage; FPROB=yes]\ D + C + Lp + E + S.A