Does random permutation tests for regression or generalized linear model analyses (R.W. Payne).
Options
PRINT = string tokens |
Controls printed output (probability , accumulated , summary , critical ); default prob |
---|---|
CONSTANT = string token |
How to treat the constant (estimate , omit ); default esti |
FACTORIAL = scalar |
Limit on the number of variates and/or factors in the terms to be fitted; default 3 |
NTIMES = scalar |
Number of permutations to make; default 999 |
BLOCKSTRUCTURE = formula |
Model formula defining any blocking to consider during the randomization; default none |
EXCLUDE = factors |
Factors in the block formula whose levels are not to be randomized |
SEED = scalar |
Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically |
SUMMARY = pointer |
Saves the summary analysis-of-variance (or deviance) table with permutation probabilities and critical values |
ACCUMULATED = pointer |
Saves the accumulated analysis-of-variance (or deviance) table with permutation probabilities and critical values |
BINMETHOD = string token |
How to permute binomial data (individuals , units ; default indi |
Parameter
TERMS = formula |
List of explanatory variates and factors, or model formula, defining the model to fit |
---|
Description
In regression analyses, random permutation tests provide an alternative to using the F probabilities, printed for variance ratios in summary or accumulated analysis of variance tables, when the assumptions of the analysis are not satisfied. These assumptions can be assessed by studying the residual plots produced by RCHECK
. In particular, the use of the F distribution to calculate the probabilities is based on the assumption that the residuals from each stratum have Normal distributions with equal variances, and so the histogram of residuals produced by RCHECK
should look reasonably close to the Normal, bell-shaped curve. Experience shows the analysis is robust to small departures from Normality. RPERMTEST
can be useful if the histogram looks very non-Normal. You can also use RPERMTEST
to generate probabilities for deviances or deviance ratios in generalized linear models, instead of using the customary chi-square or F distributions (which are justified by asymptotic theory).
Before using RPERMTEST
, you need to give a MODEL
statement to define the y-variate and so on, as usual for a regression or generalized model. The terms to fit in the regression model are specified by the TERMS
parameter of RPERMTEST
. As in the FIT
directive, this can supply a list of variates for a simple or multiple linear regression, or a model formula with variates and/or factors for more complicated models. As usual, the CONSTANT
option indicates whether or not to fit the constant, and the FACTORIAL
option sets a limit as usual on the number of variates and/or factors in each of the terms generated from a TERMS
formula.
The NTIMES
option defines how many random permutations to perform; by default there are 999 (as well as the “null” permutation where the data keep their original order). The SEED
option allows you to specify the seed to use for the random-number generator that is used to construct them. The default, SEED=0
, continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If NTIMES
exceed the maximum possible number of permutations for the data, an “exact” test is performed in which every permutation is used once. This is feasible only for small datasets. There are n! (n factorial) permutations of n units: 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, and so on.
If the regression is being used to analyse a designed experiment, you may need to use the BLOCKSTRUCTURE
option to specify a block model to define how to do the randomization. The EXCLUDE
option can then restrict the randomization so that one or more of the factors in the block model is not randomized. See the RANDOMIZE
directive for further details.
The BINMETHOD
option controls how the permutations are done for binomial data. The original data set will have contained a set of units, each recording a number of “successes” obtained from an observed number of individuals. The default, and recommended, method is to expand the data set to contain individuals themselves, and permute these. Alternatively, you can set BINMETHOD=units
if you prefer to permute the units as a whole instead.
The probabilities are determined from the distribution of the statistics of interest, over the permuted datasets. In an ordinary regression, the statistics are the variance ratios from the summary-of-analysis or accumulated-analysis-of-variance tables. In generalized linear models they will be deviances when the dispersion is fixed, or deviance ratios when it is estimated (as defined by the DISPERSION
option of the MODEL
directive).
Output is controlled by the PRINT
option, with settings:
probability |
to print the probability for the whole regression model; |
---|---|
summary |
to print the summary-of-analysis table with the usual probability for the regression model replaced by the probability from the permutation test; |
accumulated |
to print the accumulated analysis of variance or deviance table with the usual probabilities replaced by those from the permutation test; |
critical |
to accompany the summary or accumulated tables by a table giving estimated critical values for each of the statistics. |
The SUMMARY
and ACCUMULATED
options can save the summary and accumulated table, respectively. They are saved in pointers with a variate or text for each of its columns (source, d.f. etc). The probability variate contains the probabilities from the permutation test, and there are three additional variates to save the critical values.
Options: PRINT
, CONSTANT
, FACTORIAL
, NTIMES
, BLOCKSTRUCTURE
, EXCLUDE
, SEED
, SUMMARY
, ACCUMULATED
, BINMETHOD
.
Parameter: TERMS
.
Method
RPERMTEST
uses RANDOMIZE
to perform the permutations, taking account of any block structure of the date. The model is fitted, for each data set using either FIT
or FITINDIVIDUALLY
. (FITINDIVIDUALLY
is needed if the accumulated table is required for a generalized linear model.) The ACCUMULATED
and SUMMARY
options of RKEEP
are used to save the information from each analysis, and the QUANTILES
function is used to calculate the critical values.
Action with RESTRICT
RPERMTEST
takes account of any restrictions on any of the y-variates or x-variates or factors in the model.
See also
Procedures: APERMTEST
, CHIPERMTEST
, FEXACT2X2
.
Commands for: Regression analysis.
Example
CAPTION 'RPERMTEST examples',!t(\ '1) Modelling the relationship between counts of apples',\ 'from 12 trees (recorded as 100s of fruit) and percentage',\ 'damage by codling moth. Data from Snedecor and Cochran (1980)',\ 'Statistical Methods (7th edition), page 162.'); STYLE=meta,plain VARIATE [VALUES= 8, 6,11,22,14,17,18,24,19,23,26,40] Cropsize & [VALUES=59,58,56,53,50,45,43,42,39,38,30,27] Wormy MODEL Wormy FIT Cropsize RPERMTEST [SEED=466435] Cropsize