RPERMTEST procedure

Does random permutation tests for regression or generalized linear model analyses (R.W. Payne).

Options

`PRINT` = string tokens	Controls printed output (`probability`, `accumulated`, `summary`, `critical`); default `prob`
`CONSTANT` = string token	How to treat the constant (`estimate`, `omit`); default `esti`
`FACTORIAL` = scalar	Limit on the number of variates and/or factors in the terms to be fitted; default 3
`NTIMES` = scalar	Number of permutations to make; default 999
`BLOCKSTRUCTURE` = formula	Model formula defining any blocking to consider during the randomization; default none
`EXCLUDE` = factors	Factors in the block formula whose levels are not to be randomized
`SEED` = scalar	Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically
`SUMMARY` = pointer	Saves the summary analysis-of-variance (or deviance) table with permutation probabilities and critical values
`ACCUMULATED` = pointer	Saves the accumulated analysis-of-variance (or deviance) table with permutation probabilities and critical values
`BINMETHOD` = string token	How to permute binomial data (`individuals`, `units`; default `indi`

Parameter

`TERMS` = formula	List of explanatory variates and factors, or model formula, defining the model to fit

Description

In regression analyses, random permutation tests provide an alternative to using the F probabilities, printed for variance ratios in summary or accumulated analysis of variance tables, when the assumptions of the analysis are not satisfied. These assumptions can be assessed by studying the residual plots produced by RCHECK. In particular, the use of the F distribution to calculate the probabilities is based on the assumption that the residuals from each stratum have Normal distributions with equal variances, and so the histogram of residuals produced by RCHECK should look reasonably close to the Normal, bell-shaped curve. Experience shows the analysis is robust to small departures from Normality. RPERMTEST can be useful if the histogram looks very non-Normal. You can also use RPERMTEST to generate probabilities for deviances or deviance ratios in generalized linear models, instead of using the customary chi-square or F distributions (which are justified by asymptotic theory).

Before using RPERMTEST, you need to give a MODEL statement to define the y-variate and so on, as usual for a regression or generalized model. The terms to fit in the regression model are specified by the TERMS parameter of RPERMTEST. As in the FIT directive, this can supply a list of variates for a simple or multiple linear regression, or a model formula with variates and/or factors for more complicated models. As usual, the CONSTANT option indicates whether or not to fit the constant, and the FACTORIAL option sets a limit as usual on the number of variates and/or factors in each of the terms generated from a TERMS formula.

The NTIMES option defines how many random permutations to perform; by default there are 999 (as well as the “null” permutation where the data keep their original order). The SEED option allows you to specify the seed to use for the random-number generator that is used to construct them. The default, SEED=0, continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If NTIMES exceed the maximum possible number of permutations for the data, an “exact” test is performed in which every permutation is used once. This is feasible only for small datasets. There are n! (n factorial) permutations of n units: 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, and so on.

If the regression is being used to analyse a designed experiment, you may need to use the BLOCKSTRUCTURE option to specify a block model to define how to do the randomization. The EXCLUDE option can then restrict the randomization so that one or more of the factors in the block model is not randomized. See the RANDOMIZE directive for further details.

The BINMETHOD option controls how the permutations are done for binomial data. The original data set will have contained a set of units, each recording a number of “successes” obtained from an observed number of individuals. The default, and recommended, method is to expand the data set to contain individuals themselves, and permute these. Alternatively, you can set BINMETHOD=units if you prefer to permute the units as a whole instead.

The probabilities are determined from the distribution of the statistics of interest, over the permuted datasets. In an ordinary regression, the statistics are the variance ratios from the summary-of-analysis or accumulated-analysis-of-variance tables. In generalized linear models they will be deviances when the dispersion is fixed, or deviance ratios when it is estimated (as defined by the DISPERSION option of the MODEL directive).

Output is controlled by the PRINT option, with settings:

`probability`	to print the probability for the whole regression model;
`summary`	to print the summary-of-analysis table with the usual probability for the regression model replaced by the probability from the permutation test;
`accumulated`	to print the accumulated analysis of variance or deviance table with the usual probabilities replaced by those from the permutation test;
`critical`	to accompany the summary or accumulated tables by a table giving estimated critical values for each of the statistics.

The SUMMARY and ACCUMULATED options can save the summary and accumulated table, respectively. They are saved in pointers with a variate or text for each of its columns (source, d.f. etc). The probability variate contains the probabilities from the permutation test, and there are three additional variates to save the critical values.

Options: PRINT, CONSTANT, FACTORIAL, NTIMES, BLOCKSTRUCTURE, EXCLUDE, SEED, SUMMARY, ACCUMULATED, BINMETHOD.
Parameter: TERMS.

Method

RPERMTEST uses RANDOMIZE to perform the permutations, taking account of any block structure of the date. The model is fitted, for each data set using either FIT or FITINDIVIDUALLY. (FITINDIVIDUALLY is needed if the accumulated table is required for a generalized linear model.) The ACCUMULATED and SUMMARY options of RKEEP are used to save the information from each analysis, and the QUANTILES function is used to calculate the critical values.

Action with `RESTRICT`

RPERMTEST takes account of any restrictions on any of the y-variates or x-variates or factors in the model.

Example

CAPTION   'RPERMTEST examples',!t(\
          '1) Modelling the relationship between counts of apples',\
          'from 12 trees (recorded as 100s of fruit) and percentage',\
          'damage by codling moth. Data from Snedecor and Cochran (1980)',\
          'Statistical Methods (7th edition), page 162.'); STYLE=meta,plain
VARIATE   [VALUES= 8, 6,11,22,14,17,18,24,19,23,26,40] Cropsize
&         [VALUES=59,58,56,53,50,45,43,42,39,38,30,27] Wormy
MODEL     Wormy
FIT       Cropsize
RPERMTEST [SEED=466435] Cropsize

Updated on January 12, 2022

Was this article helpful?

Yes No