KRUSKAL procedure

Carries out a Kruskal-Wallis one-way analysis of variance (S.J. Welham, N.M. Maclaren & H.R. Simpson).

Options

`PRINT` = string tokens	Output required (`test`, `ranks`): `test` produces the relevant test statistics, `ranks` produces a vector of ranks for each sample relative to the whole data set; default `test`
`GROUPS` = factor	Defines the sample membership if only one variate is specified by `DATA`
`STATISTIC` = scalar	Scalar to save the Kruskal-Wallis test statistic
`MEANRANKS` = variate	Variate to save the mean ranks of the samples
`DF` = scalar	Scalar to save the degrees of freedom for the statistic
`PROBABILITY` = scalar	Scalar to save the probability for the statistic

Parameters

`DATA` = variates	List of variates containing the data for each sample, or a single variate containing the data from all the samples (the `GROUPS` option must then be set to indicate the sample to which each unit belongs)
`RANKS` = variates	Allow the ranks to be saved (relative to the combined data)

Description

KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance on the ranks (relative to the whole data set) of a set of samples. The samples can be stored in different variates and supplied as a list in the DATA pointer. Alternatively, they can all be placed in a single variate, and the GROUPS option set to a factor to indicate the sample to which each unit belongs. Output from the procedure is controlled by the PRINT option: test (the default setting) prints the relevant test statistics, and ranks prints the vector of ranks for each sample.

The test statistic, vector of mean ranks, degrees of freedom and test probability can be saved using the STATISTIC, MEANRANKS, DF and PROBABILITY options, respectively. The ranks parameter can be set to a variate, or variates, to store the ranks of the data relative to the whole data set.

Options: PRINT, GROUPS, STATISTIC, MEANRANKS, DF, PROBABILITY.

Parameters: DATA, RANKS.

Method

The Kruskal-Wallis One-Way Analysis of Variance is used to test the hypothesis that several (K) samples come from distributions with the same mean. The test statistic H, is formed by ranking the combined data set, then considering the sum of these ranks within each sample:

H = [ (12 / N×(N+1)) × ∑_j=1…K { R_j×R_j/n_j } ] – 3×(N+1)

where R_j is the sum of ranks for the jth sample,

n_j is the size of the jth sample, and

N is the size of the combined data set.

If ties are present in the data, then an adjustment to the statistic H is required:

adjusted H = H /( 1 – ∑_k { t_k³–t_k }/(N³–N) )

where t_k is the number of observations with rank k. (See for example Siegel 1956, pages 184-193.)

When there are at least five cases in each of the samples, H has approximately a Chi-square distribution on K-1 degrees of freedom. When this condition is not satisfied, and there are three samples, KRUSKAL uses a table of calculated values of the distribution of the statistic.

Action with `RESTRICT`

The variates in DATA can be restricted, and in different ways. KRUSKAL uses only those units of each variate that are not excluded by their respective restrictions.

Reference

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.

Example

CAPTION 'KRUSKAL example',\
        !t('Data from Siegel (1956), Nonparametric Statistics,',\
        'p. 187. Three sets of individuals from different groups are',\
        'selected and receive scores.'); STYLE=meta,plain
VARIATE [VALUES=96,128, 83, 61,101]  To
&       [VALUES=82,124,132,135,109]  Ao
&       [VALUES=115,149,166,147]  Admin
PRINT   To,Ao,Admin; DECIMALS=0; FIELD=7
CAPTION !T('A Kruskal-Wallis Analysis of Variance is performed to test',\
        'for differences in scores between the groups.')
KRUSKAL [STATISTIC=H; MEANRANKS=Meanranks] To,Ao,Admin; RANKS=RTo,RAo,RAdmin
PRINT   H
&       RTo,RAo,RAdmin,Meanranks; DECIMALS=1; FIELD=8

Updated on March 27, 2024

Was this article helpful?

Yes No