Does random permutation tests for regression or generalized linear model analyses (R.W. Payne).

### Options

`PRINT` = string tokens |
Controls printed output (`probability` , `accumulated` , `summary` , `critical` ); default `prob` |
---|---|

`CONSTANT` = string token |
How to treat the constant (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit on the number of variates and/or factors in the terms to be fitted; default 3 |

`NTIMES` = scalar |
Number of permutations to make; default 999 |

`BLOCKSTRUCTURE` = formula |
Model formula defining any blocking to consider during the randomization; default none |

`EXCLUDE` = factors |
Factors in the block formula whose levels are not to be randomized |

`SEED` = scalar |
Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically |

`SUMMARY` = pointer |
Saves the summary analysis-of-variance (or deviance) table with permutation probabilities and critical values |

`ACCUMULATED` = pointer |
Saves the accumulated analysis-of-variance (or deviance) table with permutation probabilities and critical values |

`BINMETHOD` = string token |
How to permute binomial data (`individuals` , `units` ; default `indi` |

### Parameter

`TERMS` = formula |
List of explanatory variates and factors, or model formula, defining the model to fit |
---|

### Description

In regression analyses, random permutation tests provide an alternative to using the F probabilities, printed for variance ratios in summary or accumulated analysis of variance tables, when the assumptions of the analysis are not satisfied. These assumptions can be assessed by studying the residual plots produced by `RCHECK`

. In particular, the use of the F distribution to calculate the probabilities is based on the assumption that the residuals from each stratum have Normal distributions with equal variances, and so the histogram of residuals produced by `RCHECK`

should look reasonably close to the Normal, bell-shaped curve. Experience shows the analysis is robust to small departures from Normality. `RPERMTEST`

can be useful if the histogram looks very non-Normal. You can also use `RPERMTEST`

to generate probabilities for deviances or deviance ratios in generalized linear models, instead of using the customary chi-square or F distributions (which are justified by asymptotic theory).

Before using `RPERMTEST`

, you need to give a `MODEL`

statement to define the y-variate and so on, as usual for a regression or generalized model. The terms to fit in the regression model are specified by the `TERMS`

parameter of `RPERMTEST`

. As in the `FIT`

directive, this can supply a list of variates for a simple or multiple linear regression, or a model formula with variates and/or factors for more complicated models. As usual, the `CONSTANT`

option indicates whether or not to fit the constant, and the `FACTORIAL`

option sets a limit as usual on the number of variates and/or factors in each of the terms generated from a `TERMS`

formula.

The `NTIMES`

option defines how many random permutations to perform; by default there are 999 (as well as the “null” permutation where the data keep their original order). The `SEED`

option allows you to specify the seed to use for the random-number generator that is used to construct them. The default, `SEED=0`

, continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If `NTIMES`

exceed the maximum possible number of permutations for the data, an “exact” test is performed in which every permutation is used once. This is feasible only for small datasets. There are *n*! (n factorial) permutations of *n* units: 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, and so on.

If the regression is being used to analyse a designed experiment, you may need to use the `BLOCKSTRUCTURE`

option to specify a block model to define how to do the randomization. The `EXCLUDE`

option can then restrict the randomization so that one or more of the factors in the block model is not randomized. See the `RANDOMIZE`

directive for further details.

The `BINMETHOD`

option controls how the permutations are done for binomial data. The original data set will have contained a set of units, each recording a number of “successes” obtained from an observed number of individuals. The default, and recommended, method is to expand the data set to contain individuals themselves, and permute these. Alternatively, you can set `BINMETHOD=units`

if you prefer to permute the units as a whole instead.

The probabilities are determined from the distribution of the statistics of interest, over the permuted datasets. In an ordinary regression, the statistics are the variance ratios from the summary-of-analysis or accumulated-analysis-of-variance tables. In generalized linear models they will be deviances when the dispersion is fixed, or deviance ratios when it is estimated (as defined by the `DISPERSION`

option of the `MODEL`

directive).

Output is controlled by the `PRINT`

option, with settings:

`probability` |
to print the probability for the whole regression model; |
---|---|

`summary` |
to print the summary-of-analysis table with the usual probability for the regression model replaced by the probability from the permutation test; |

`accumulated` |
to print the accumulated analysis of variance or deviance table with the usual probabilities replaced by those from the permutation test; |

`critical` |
to accompany the summary or accumulated tables by a table giving estimated critical values for each of the statistics. |

The `SUMMARY`

and `ACCUMULATED`

options can save the summary and accumulated table, respectively. They are saved in pointers with a variate or text for each of its columns (source, d.f. etc). The probability variate contains the probabilities from the permutation test, and there are three additional variates to save the critical values.

Options: `PRINT`

, `CONSTANT`

, `FACTORIAL`

, `NTIMES`

, `BLOCKSTRUCTURE`

, `EXCLUDE`

, `SEED`

, `SUMMARY`

, `ACCUMULATED`

, `BINMETHOD`

.

Parameter: `TERMS`

.

### Method

`RPERMTEST`

uses `RANDOMIZE`

to perform the permutations, taking account of any block structure of the date. The model is fitted, for each data set using either `FIT`

or `FITINDIVIDUALLY`

. (`FITINDIVIDUALLY`

is needed if the accumulated table is required for a generalized linear model.) The `ACCUMULATED`

and `SUMMARY`

options of `RKEEP`

are used to save the information from each analysis, and the `QUANTILES`

function is used to calculate the critical values.

### Action with `RESTRICT`

`RPERMTEST`

takes account of any restrictions on any of the y-variates or x-variates or factors in the model.

### See also

Procedures: `APERMTEST`

, `CHIPERMTEST`

, `FEXACT2X2`

.

Commands for: Regression analysis.

### Example

CAPTION 'RPERMTEST examples',!t(\ '1) Modelling the relationship between counts of apples',\ 'from 12 trees (recorded as 100s of fruit) and percentage',\ 'damage by codling moth. Data from Snedecor and Cochran (1980)',\ 'Statistical Methods (7th edition), page 162.'); STYLE=meta,plain VARIATE [VALUES= 8, 6,11,22,14,17,18,24,19,23,26,40] Cropsize & [VALUES=59,58,56,53,50,45,43,42,39,38,30,27] Wormy MODEL Wormy FIT Cropsize RPERMTEST [SEED=466435] Cropsize