Does random permutation tests for regression or generalized linear model analyses (R.W. Payne).
|Controls printed output (
||How to treat the constant (
||Limit on the number of variates and/or factors in the terms to be fitted; default 3|
||Number of permutations to make; default 999|
||Model formula defining any blocking to consider during the randomization; default none|
||Factors in the block formula whose levels are not to be randomized|
||Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically|
||List of explanatory variates and factors, or model formula, defining the model to fit|
In regression analyses, random permutation tests provide an alternative to using the F probabilities, printed for variance ratios in summary or accumulated analysis of variance tables, when the assumptions of the analysis are not satisfied. These assumptions can be assessed by studying the residual plots produced by
RCHECK. In particular, the use of the F distribution to calculate the probabilities is based on the assumption that the residuals from each stratum have Normal distributions with equal variances, and so the histogram of residuals produced by
RCHECK should look reasonably close to the Normal, bell-shaped curve. Experience shows the analysis is robust to small departures from Normality.
RPERMTEST can be useful if the histogram looks very non-Normal. You can also use
RPERMTEST to generate probabilities for deviances or deviance ratios in generalized linear models, instead of using the customary chi-square or F distributions (which are justified by asymptotic theory).
RPERMTEST, you need to give a
MODEL statement to define the y-variate and so on, as usual for a regression or generalized model. The terms to fit in the regression model are specified by the
TERMS parameter of
RPERMTEST. As in the
FIT directive, this can supply a list of variates for a simple or multiple linear regression, or a model formula with variates and/or factors for more complicated models. As usual, the
CONSTANT option indicates whether or not to fit the constant, and the
FACTORIAL option sets a limit as usual on the number of variates and/or factors in each of the terms generated from a
NTIMES option defines how many random permutations to perform; by default there are 999 (as well as the “null” permutation where the data keep their original order). The
SEED option allows you to specify the seed to use for the random-number generator that is used to construct them. The default,
SEED=0, continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If
NTIMES exceed the maximum possible number of permutations for the data, an “exact” test is performed in which every permutation is used once. This is feasible only for small datasets. There are n! (n factorial) permutations of n units: 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, and so on.
If the regression is being used to analyse a designed experiment, you may need to use the
BLOCKSTRUCTURE option to specify a block model to define how to do the randomization. The
EXCLUDE option can then restrict the randomization so that one or more of the factors in the block model is not randomized. See the
RANDOMIZE directive for further details.
The probabilities are determined from the distribution of the statistics of interest, over the permuted datasets. In an ordinary regression, the statistics are the variance ratios from the summary-of-analysis or accumulated-analysis-of-variance tables. In generalized linear models they will be deviances when the dispersion is fixed, or deviance ratios when it is estimated (as defined by the
DISPERSION option of the
Output is controlled by the
||to print the probability for the whole regression model;|
||to print the summary-of-analysis table with the usual probability for the regression model replaced by the probability from the permutation test;|
||to print the accumulated analysis of variance or deviance table with the usual probabilities replaced by those from the permutation test;|
||to accompany the summary or accumulated tables by a table giving estimated critical values for each of the statistics.|
RANDOMIZE to perform the permutations, taking account of any block structure of the date. The model is fitted, for each data set using either
FITINDIVIDUALLY is needed if the accumulated table is required for a generalized linear model.) The
SUMMARY options of
RKEEP are used to save the information from each analysis, and the
QUANTILES function is used to calculate the critical values.
RPERMTEST takes account of any restrictions on any of the y-variates or x-variates or factors in the model.
Commands for: Regression analysis.
CAPTION 'RPERMTEST examples',!t(\ '1) Modelling the relationship between counts of apples',\ 'from 12 trees (recorded as 100s of fruit) and percentage',\ 'damage by codling moth. Data from Snedecor and Cochran (1980)',\ 'Statistical Methods (7th edition), page 162.'); STYLE=meta,plain VARIATE [VALUES= 8, 6,11,22,14,17,18,24,19,23,26,40] Cropsize & [VALUES=59,58,56,53,50,45,43,42,39,38,30,27] Wormy MODEL Wormy FIT Cropsize RPERMTEST [SEED=466435] Cropsize