Performs a random permutation test for a two-dimensional contingency table (L.H. Schmitt, M.C. Hannah & S.J. Welham).
Options
PRINT = string tokens |
Output required (summary , observed , expected ); default summ |
---|---|
PLOT = string token |
What to plot (histogram ); default hist |
METHOD = string token |
Method for calculating chi-square (pearson , maximumlikelihood ); default pear |
NTIMES = scalar |
Number of permutations to make; default 999 |
SEED = scalar |
Seed for the random number generator used to make the permutations; default 0 continues from the previous generation or (if none) initializes the seed automatically |
Parameters
DATA = tables |
Table containing observed data |
---|---|
CHISQUARE = scalars |
Saves the observed chi-square value |
CHIPERMUTED = variates |
Saves the chi-square values from the permuted data sets |
PROBABILITY = scalars |
Saves the probability value from the test |
Description
The CHIPERMTEST
procedure uses a permutation test to calculate the significance probability for a chi-square test of the independence of rows and columns in a two-dimensional contingency table. This provides a nonparametric alternative to the more usual chi-square test of independence (see the CHISQUARE
procedure). The usual test depends upon the fact that the distribution of its so-called “chi-square” test statistic becomes a chi-square distribution as the numbers of observations become infinite. (Technically, we would say that the distribution is asymptotically chi-square.) However, the test is unreliable with smaller numbers, especially when the expected number in any cell of the table is less than five.
The permutation test simulates the random distribution of table values that may occur in tables that have the same overall distribution of numbers over the columns, and over the rows, as in the original table. We can assess the significance of the chi-square statistic that we can calculate from the observed table, by seeing where it lies in the distribution of statistics that we obtain from the permuted data.
The NTIMES
option specifies how many permutations are done (default 999). The SEED
option supplies the seed that is used in the RANDOMIZE
directive to generate the permutations. The default of zero continues the existing sequence of random numbers if RANDOMIZE
has already been used in the current Genstat job. If RANDOMIZE
has not yet been used, Genstat picks a seed at random.
The DATA
parameter supplies the observed data values, in a table with two classifying factors. The CHISQUARE
can save the chi-square statistic calculated from the DATA
table (in a scalar). The CHIPERMUTED
parameter can save the chi-square statistics calculated from the permuted data sets (in a variate), and the PROBABILITY
parameter can save the significance probability from the permutation test (in a scalar).
The PRINT
option controls the output, with the following settings:
summary |
prints a summary, containing the chi-square statistic, the minimum and maximum statistics calculated from the permuted data sets, and the probability (default); |
---|---|
observed |
prints the DATA table; and |
expected |
prints the expected values for tables with the same overall distribution of numbers over rows and over columns, but no interaction between the row and column factors (i.e. in a table where the rows and columns are independent). |
By default, CHIPERMTEST
plots a histogram showing the distribution of statistics obtained from the permuted data sets, with the chi-square statistic from the observed data superimposed as a vertical line. You can suppress this by setting option PLOT=*
.
The METHOD
option controls how the chi-square statistic is calculated. The default is to use the usual Pearson approximation (see the Method section), but you can set METHOD=likelihood
to calculate it by maximum likelihood instead (using the Genstat facilities for generalized linear models).
Options: PRINT
, PLOT
, METHOD
, NTIMES
, SEED
.
Parameters: DATA
, CHISQUARE
, CHIPERMUTED
, PROBABILITY
.
Method
The Pearson statistic is calculated as
chi-square = sum( (o–e) × (o–e) / e ),
where o = observed, and e = expected. The alternative, maximum-likelihood method takes the deviance from fitting a generalized linear model with a log link and a Poisson distribution.
The permutations are constructed using the method Roff & Bentzen (1989).
Reference
Roff, D.A. & Bentzen, P. (1989). The statistical analysis of mitochondrial DNA polymorphisms: χ2 and the problem of small samples. Mol. Biol. Evol., 6, 539-545.
See also
Procedure: CHISQUARE
.
Commands for: Basic and nonparametric statistics, Regression analysis.
Example
CAPTION 'CHIPERMTEST example','Data from Roff & Bentzen (1988)';\ STYLE=meta,plain FACTOR [LEVELS=14; LABELS=!t(A,B,C,D,E,F,G,H,I,J,K,L,M,N)] River FACTOR [LEVELS=2] Gene2 TABLE [CLASSIFICATION=River,Gene2; VALUES=\ 13,16,8,10,8,5,11,6,9,11,12,10,11,8,\ 17,4,10,1,12,7,6,4,12,5,16,5,7,0] B2 CHISQUARE B2 CHIPERMTEST [PRINT=summary,observed,expected; SEED=301453] B2