CHIPERMTEST procedure

Performs a random permutation test for a two-dimensional contingency table (L.H. Schmitt, M.C. Hannah & S.J. Welham).

Options

`PRINT` = string tokens	Output required (`summary`, `observed`, `expected`); default `summ`
`PLOT` = string token	What to plot (`histogram`); default `hist`
`METHOD` = string token	Method for calculating chi-square (`pearson`, `maximumlikelihood`); default `pear`
`NTIMES` = scalar	Number of permutations to make; default 999
`SEED` = scalar	Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically

Parameters

`DATA` = tables	Table containing observed data
`CHISQUARE` = scalars	Saves the observed chi-square value
`CHIPERMUTED` = variates	Saves the chi-square values from the permuted data sets
`PROBABILITY` = scalars	Saves the probability value from the test

Description

The CHIPERMTEST procedure uses a permutation test to calculate the significance probability for a chi-square test of the independence of rows and columns in a two-dimensional contingency table. This provides a nonparametric alternative to the more usual chi-square test of independence (see the CHISQUARE procedure). The usual test depends upon the fact that the distribution of its so-called “chi-square” test statistic becomes a chi-square distribution as the numbers of observations become infinite. (Technically, we would say that the distribution is asymptotically chi-square.) However, the test is unreliable with smaller numbers, especially when the expected number in any cell of the table is less than five.

The permutation test simulates the random distribution of table values that may occur in tables that have the same overall distribution of numbers over the columns, and over the rows, as in the original table. We can assess the significance of the chi-square statistic that we can calculate from the observed table, by seeing where it lies in the distribution of statistics that we obtain from the permuted data.

The NTIMES option specifies how many permutations are done (default 999). The SEED option supplies the seed that is used in the RANDOMIZE directive to generate the permutations. The default of zero continues the existing sequence of random numbers if RANDOMIZE has already been used in the current Genstat job. If RANDOMIZE has not yet been used, Genstat picks a seed at random.

The DATA parameter supplies the observed data values, in a table with two classifying factors. The CHISQUARE can save the chi-square statistic calculated from the DATA table (in a scalar). The CHIPERMUTED parameter can save the chi-square statistics calculated from the permuted data sets (in a variate), and the PROBABILITY parameter can save the significance probability from the permutation test (in a scalar).

The PRINT option controls the output, with the following settings:

`summary`	prints a summary, containing the chi-square statistic, the minimum and maximum statistics calculated from the permuted data sets, and the probability (default);
`observed`	prints the `DATA` table; and
`expected`	prints the expected values for tables with the same overall distribution of numbers over rows and over columns, but no interaction between the row and column factors (i.e. in a table where the rows and columns are independent).

By default, CHIPERMTEST plots a histogram showing the distribution of statistics obtained from the permuted data sets, with the chi-square statistic from the observed data superimposed as a vertical line. You can suppress this by setting option PLOT=*.

The METHOD option controls how the chi-square statistic is calculated. The default is to use the usual Pearson approximation (see the Method section), but you can set METHOD=likelihood to calculate it by maximum likelihood instead (using the Genstat facilities for generalized linear models).

Options: PRINT, PLOT, METHOD, NTIMES, SEED.

Parameters: DATA, CHISQUARE, CHIPERMUTED, PROBABILITY.

Method

The Pearson statistic is calculated as

chi-square = sum( (o–e) × (o–e) / e ),

where o = observed, and e = expected. The alternative, maximum-likelihood method takes the deviance from fitting a generalized linear model with a log link and a Poisson distribution.

The permutations are constructed using the method Roff & Bentzen (1989).

Reference

Roff, D.A. & Bentzen, P. (1989). The statistical analysis of mitochondrial DNA polymorphisms: χ² and the problem of small samples. Mol. Biol. Evol., 6, 539-545.

Example

CAPTION     'CHIPERMTEST example','Data from Roff & Bentzen (1988)';\
            STYLE=meta,plain
FACTOR      [LEVELS=14; LABELS=!t(A,B,C,D,E,F,G,H,I,J,K,L,M,N)] River
FACTOR      [LEVELS=2] Gene2
TABLE       [CLASSIFICATION=River,Gene2; VALUES=\
            13,16,8,10,8,5,11,6,9,11,12,10,11,8,\
            17,4,10,1,12,7,6,4,12,5,16,5,7,0] B2
CHISQUARE   B2
CHIPERMTEST [PRINT=summary,observed,expected; SEED=301453] B2

Updated on March 8, 2019

Was this article helpful?

Yes No