Calculates Wald and F tests for dropping terms from a regression (R.W. Payne).
|Controls printed output (
||Limit on number of factors in the model terms generated from the
||Y-variate from whose analysis to calculate the statistics; default is the last y-variate in
||Saves the residual d.f. used to calculate F probabilities when the dispersion is not fixed|
||Specifies the save structure (from
||Model terms for which tests are required|
||Saves Wald statistics|
||Saves d.f. of Wald statistics|
||Saves the probabilities for the Wald statistics if the dispersion is fixed, or the corresponding F statistics if it is estimated|
RWALD provides Wald tests to help you decide whether any terms can be dropped from a regression model. The model must have been fitted already by the regression commands (
FIT etc.) in the usual way. The tests are usually produced for the most recent regression analysis, but you can set the
Y options to request tests from an earlier analysis.
RWALD produces tests for all the terms that can be dropped from the model: that is, for every term that is not marginal to another term in the model. For example, in the formula
A + B + C + D + A.B + A.D + B.D
B.D can be dropped as there are no other terms in the model that contain all their factors (i.e. none to which they are marginal). However,
A cannot be dropped until
A.D have been dropped. You can use the
TERMS parameter to request Wald tests for a specific set of terms. A missing value is then given for any term that cannot be dropped. The
FACTORIAL option sets a limit on the number of factors or variates in each term that is formed from the
TERMS formula (default 3).
PRINT=waldtests (the default),
RWALD prints a table with columns containing the Wald statistic, its number of degrees of freedom and a probability value. With an ordinary linear regression,
RWALD will also print an F statistic, and use this to obtain the probability. Provided there is no aliasing between the parameters of the terms, these F statistics and probabilities will be identical to those that would be printed in the Change lines of the Summary of Analysis if the terms were dropped from the model explicitly by using the
TRY directives. The advantage of
RWALD is that the model does not have to be refitted (excluding each term) to calculate the information. It thus provides a much more efficient method of assessing the model.
F statistics are also given with any generalized linear model in which the dispersion is not fixed (e.g. models involving the gamma distribution). However, in generalized linear models with a fixed dispersion (e.g. binomial or Poisson), the probabilities are obtained by treating the Wald statistics as chi-square statistics. The deviances and deviance ratios used by
DROP are calculated from the likelihoods of the generalized linear models, whereas the Wald and F statistics are essentially based on weighted sums of squares. So probabilities calculated by
RWALD will no longer be identical to those given by
DROP. However, both sets of probabilities are based on the asymptotic properties of their statistics, and so they should give similar conclusions.
WALDSTATISTIC parameter can save the statistics, and the
DF parameter can save their numbers of degrees of freedom. If you are making a Wald test for a single term, you can supply a scalar for each of these parameters. However, if you have several terms, you must supply a pointer which will then be set up to contain as many scalars as there are terms. Similarly the
PROBABILITY parameter saves the probabilities for the Wald statistics if the dispersion is fixed, or the corresponding F statistics if it is estimated. The number residual degrees of freedom for the F statistics can be saved, in a scalar, by the
RDF option. This contains a missing value if the dispersion is fixed.
Commands for: Regression analysis.
CAPTION 'RWALD example',\ 'Cloud seeding example; see Guide to Genstat Part 2, Section 3.3.';\ STYLE=meta,plain " Variables are: A Action (NS not seeded, S seeded) D Days after first day of experiment S Suitability for seeding (from model) C Percent cloud cover P Previous rainfall (in 10**7 cubic m) E Type of cloud (1 or 2) Y Subsequent rainfall (in 10**7 cubic m)" FACTOR [LABELS=!t(S,NS)] A FACTOR [LEVELS=2] E READ A,D,S,C,P,E,Y; FREPRESENTATION=labels,4(*),levels,* NS 0 1.75 13.4 0.274 2 12.85 S 1 2.70 37.9 1.267 1 5.52 S 3 4.10 3.9 0.198 2 6.29 NS 4 2.35 5.3 0.526 1 6.11 S 6 4.25 7.1 0.250 1 2.45 NS 9 1.60 6.9 0.018 2 3.61 NS 18 1.30 4.6 0.307 1 0.47 NS 25 3.35 4.9 0.194 1 4.56 NS 27 2.85 12.1 0.751 1 6.35 S 28 2.20 5.2 0.084 1 5.06 S 29 4.40 4.1 0.236 1 2.76 S 32 3.10 2.8 0.214 1 4.05 NS 33 3.95 6.8 0.796 1 5.74 S 35 2.90 3.0 0.124 1 4.84 S 38 2.05 7.0 0.144 1 11.86 NS 39 4.00 11.3 0.398 1 4.45 NS 53 3.35 4.2 0.237 2 3.66 S 55 3.70 3.3 0.960 1 4.22 NS 56 3.80 2.2 0.230 1 1.16 S 59 3.40 6.5 0.142 2 5.45 S 65 3.15 3.1 0.073 1 2.02 NS 68 3.15 2.6 0.136 1 0.82 S 82 4.01 8.3 0.123 1 1.09 NS 83 4.65 7.4 0.168 1 0.28 : CALCULATE Lp,Ly = LOG10(P,Y) MODEL Ly TERMS A*(D+S+C+Lp+E) FIT [PRINT=model,estimates] A + S + D + C + Lp + E + S.A RWALD TRY [PRINT=model,summary; NOMESSAGE=residual,leverage; FPROB=yes]\ D + C + Lp + E + S.A