MANNWHITNEY procedure

Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).

Options

`PRINT` = string tokens	Output required (`test`, `ranks`, `hodgeslehmann` `confidence`); default `test`
`METHOD` = string token	Type of test required (`twosided`, `greaterthan`, `lessthan`); default `twos`
`GROUPS` = factor	Defines the samples for a two-sample test if the `Y2` parameter is not set
`CIPROBABILITY` = scalar	Probability for the confidence interval for the median difference between the samples; default 0.95
`CONTROL` = scalar or text	Identifies the control group against which to make comparisons if `GROUPS` is set; default uses the reference level of `GROUPS`

Parameters

`Y1` = variates	Identifier of the variate holding the first sample if `Y2` is set, or both samples if `Y2` is unset (the `GROUPS` option must then also be set)
`Y2` = variates	Identifier of the variate holding the second sample
`R1` = variates	Saves the ranks of the first sample if `Y2` is set, or both samples if `Y2` is unset
`R2` = variates	Saves the ranks of the second sample if `Y2` is set
`STATISTIC` = scalars or tables	Saves the test statistics U
`PROBABILITY` = scalars or tables	Probability values for the test statistics
`SIGN` = scalars or tables	Saves indicators: 1 if the first sample scores the highest ranks on average, 0 otherwise
`HODGESLEHMANN` = scalars or tables	Saves the Hodges-Lehmann estimates for the differences in location of the two samples (i.e. the median differences between the samples)
`LOWER` = scalars or tables	Saves lower confidence values for the samples Hodges-Lehmann estimates
`UPPER` = scalars or tables	Saves upper confidence values for the Hodges-Lehmann estimates

Description

The Mann-Whitney U test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters Y1 and Y2. Alternatively, they can be stored in a single variate, supplied by Y1, with the GROUPS option set to a factor to identify which unit belongs to each sample. The GROUPS option is ignored when the Y2 parameter is set. If GROUPS has more than 2 levels, each group is compared against a control group. You can define which level (or label) of GROUPS represents the control by setting the CONTROL option to a scalar or text. If CONTROL is not set, the reference level of GROUPS is used.

MANNWHITNEY calculates the test statistic U, along with its its associated probability value. An exact probability is calculated (using procedure PRMANNWHITNEYU) if the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the STATISTIC and PROBABILITY parameters respectively. Parameter SIGN holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. Usually STATISTIC, PROBABILITY and SIGN will save scalars, but they will save tables classified by the GROUPS factor when GROUPS is set to a factor with more than two levels. The ranks (with respect to the combined data set) for each sample can be saved using the R1 and R2 parameters.

Printed output is controlled by the PRINT option, with settings

`test`	test statistic and probability,
`ranks`	ranks (with respect to the whole data set) for each sample,
`hodgeslehmann`	Hodges-Lehmann estimate of the difference in the locations of the samples, with confidence limits, and
`confidence`	synonym of `hodgeslehmann`.

The probability for the confidence limits is specified by the CIPROBABILITY option; the default, of 0.95, gives a 95% interval. The calculation of the interval may be slow when there are ties amongst the values, as essentially MANNWHITNEY then has to invert the probability function. The Hodges-Lehmann estimates can be saved by the HODGESLEHMANN parameter. The lower and upper confidence values can be saved by the LOWER and UPPER parameters, respectively.

By default a two-sided test is done (to assess that samples are unequal) but the METHOD option can be set to greaterthan to test that the first sample is greater than the than the second, or lessthan to test that it is smaller.

Options: PRINT, METHOD, GROUPS, CIPROBABILITY.
Parameters: Y1, Y2, R1, R2, STATISTIC, PROBABILITY, SIGN, HODGESLEHMANN, LOWER, UPPER.

Method

The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.

The test statistic U is formed using ranks found from the combined data set, and is taken to be the smaller of U₁ and U₂, where

U_k = n₁ × n₂ + n_k × (n_k+1) / 2 – R_k ; k=1,2

and n_k is the size of sample k, R_k is the sum of ranks for sample k. This score U_k can be interpreted as the number of times a rank score in the other sample precedes a score in sample k in the ranking. So the sample with the lowest score has, on average, smaller rank scores.

The PRMANNWHITNEYU procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used:

Normal = ( n₁ × n₂ / 2 – U ) / √{ n₁ × n₂ × ( n₁+n₂+1 ) / 12 }

If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:

√{ n₁ × n₂ / (N × (N-1)) × ( (N³–N) / 12 – ∑_k T_k ) }

where T_k = ( t_k³–t_k )/12 and t_k is the number of observations with rank k. (See for example Siegel 1956, pages 116-127.)

The Hodges-Lehmann estimate is calculated as the median of all the differences between pairs of units (with one unit from each sample)

Action with `RESTRICT`

The variates Y1 and Y2 can be restricted, and in different ways. MANNWHITNEY uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1 and GROUPS, allowing RESTRICT to be used for example to limit the data to only two groups when the GROUPS factor has more than two levels.

Reference

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.

Example

CAPTION      'MANNWHITNEY example',\ 
            !t('Data from Siegel (1956), Nonparametric Statistics,',\
            'p. 119. The behaviour of two groups of rats with different',\
            'backgrounds is studied and a score is assigned to each',\
            'individual.'); STYLE=meta,plain
VARIATE     [VALUES=78,64,75,45,82] E_Rats
&           [VALUES=110,70,53,51]   C_Rats
PRINT       E_Rats,C_Rats; DECIMALS=0; FIELD=7
CAPTION     !T('A Mann-Whitney U-test is performed to test for a',\
            'difference in scores between the two groups.')
MANNWHITNEY [PRINT=test,ranks] Y1=E_Rats; Y2=C_Rats; R1=RE; R2=RC;\ 
            STATISTIC=U; PROBABILITY=Probability; SIGN=Sign
PRINT       U,Probability,Sign
&           RE,RC; DECIMALS=1; FIELD=7

Updated on January 12, 2022

Was this article helpful?

Yes No