1. Home
  2. MANNWHITNEY procedure

MANNWHITNEY procedure

Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).

Options

PRINT = string tokens Output required (test, ranks, hodgeslehmann confidence); default test
METHOD = string token Type of test required (twosided, greaterthan, lessthan); default twos
GROUPS = factor Defines the samples for a two-sample test if the Y2 parameter is not set
CIPROBABILITY = scalar Probability for the confidence interval for the median difference between the samples; default 0.95
CONTROL = scalar or text Identifies the control group against which to make comparisons if GROUPS is set; default uses the reference level of GROUPS

Parameters

Y1 = variates Identifier of the variate holding the first sample if Y2 is set, or both samples if Y2 is unset (the GROUPS option must then also be set)
Y2 = variates Identifier of the variate holding the second sample
R1 = variates Saves the ranks of the first sample if Y2 is set, or both samples if Y2 is unset
R2 = variates Saves the ranks of the second sample if Y2 is set
STATISTIC = scalars or tables Saves the test statistics U
PROBABILITY = scalars or tables Probability values for the test statistics
SIGN = scalars or tables Saves indicators: 1 if the first sample scores the highest ranks on average, 0 otherwise
HODGESLEHMANN = scalars or tables Saves the Hodges-Lehmann estimates for the differences in location of the two samples (i.e. the median differences between the samples)
LOWER = scalars or tables Saves lower confidence values for  the samples Hodges-Lehmann estimates
UPPER = scalars or tables Saves upper confidence values for the Hodges-Lehmann estimates

Description

The Mann-Whitney U test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters Y1 and Y2. Alternatively, they can be stored in a single variate, supplied by Y1, with the GROUPS option set to a factor to identify which unit belongs to each sample. The GROUPS option is ignored when the Y2 parameter is set. If GROUPS has more than 2 levels, each group is compared against a control group. You can define which level (or label) of GROUPS represents the control by setting the CONTROL option to a scalar or text. If CONTROL is not set, the reference level of GROUPS is used.

MANNWHITNEY calculates the test statistic U, along with its its associated probability value. An exact probability is calculated (using procedure PRMANNWHITNEYU) if the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the STATISTIC and PROBABILITY parameters respectively. Parameter SIGN holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. Usually STATISTIC, PROBABILITY and SIGN will save scalars, but they will save tables classified by the GROUPS factor when GROUPS is set to a factor with more than two levels. The ranks (with respect to the combined data set) for each sample can be saved using the R1 and R2 parameters.

Printed output is controlled by the PRINT option, with settings

    test test statistic and probability,
    ranks ranks (with respect to the whole data set) for each sample, 
    hodgeslehmann Hodges-Lehmann estimate of the difference in the locations of the samples, with confidence limits, and
confidence synonym of hodgeslehmann.

The probability for the confidence limits is specified by the CIPROBABILITY option; the default, of 0.95, gives a 95% interval. The calculation of the interval may be slow when there are ties amongst the values, as essentially MANNWHITNEY then has to invert the probability function. The Hodges-Lehmann estimates can be saved by the HODGESLEHMANN parameter. The lower and upper confidence values can be saved by the LOWER and UPPER parameters, respectively.

By default a two-sided test is done (to assess that samples are unequal) but the METHOD option can be set to greaterthan to test that the first sample is greater than the than the second, or lessthan to test that it is smaller.

Options: PRINT, METHOD, GROUPS, CIPROBABILITY.
Parameters: Y1, Y2, R1, R2, STATISTIC, PROBABILITY, SIGNHODGESLEHMANN, LOWER, UPPER.

Method

The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.

The test statistic U is formed using ranks found from the combined data set, and is taken to be the smaller of U1 and U2, where

Uk = n1 × n2 + nk × (nk+1) / 2 – Rk ; k=1,2

and nk is the size of sample k, Rk is the sum of ranks for sample k. This score Uk can be interpreted as the number of times a rank score in the other sample precedes a score in sample k in the ranking. So the sample with the lowest score has, on average, smaller rank scores.

The PRMANNWHITNEYU procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used:

Normal = ( n1 × n2 / 2 – U ) / √{ n1 × n2 × ( n1+n2+1 ) / 12 }

If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:

√{ n1 × n2 / (N × (N-1)) × ( (N3N) / 12 – ∑k Tk ) }

where Tk = ( tk3tk )/12 and tk is the number of observations with rank k. (See for example Siegel 1956, pages 116-127.)

The Hodges-Lehmann estimate is calculated as the median of all the differences between pairs of units (with one unit from each sample)

Action with RESTRICT

The variates Y1 and Y2 can be restricted, and in different ways. MANNWHITNEY uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1 and GROUPS, allowing RESTRICT to be used for example to limit the data to only two groups when the GROUPS factor has more than two levels.

Reference

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.

See also

Procedure: PRMANNWHITNEYU, SMANNWHITNEY, SIGNTEST, TTEST, WILCOXON.

Commands for: Basic and nonparametric statistics.

Example

CAPTION      'MANNWHITNEY example',\ 
            !t('Data from Siegel (1956), Nonparametric Statistics,',\
            'p. 119. The behaviour of two groups of rats with different',\
            'backgrounds is studied and a score is assigned to each',\
            'individual.'); STYLE=meta,plain
VARIATE     [VALUES=78,64,75,45,82] E_Rats
&           [VALUES=110,70,53,51]   C_Rats
PRINT       E_Rats,C_Rats; DECIMALS=0; FIELD=7
CAPTION     !T('A Mann-Whitney U-test is performed to test for a',\
            'difference in scores between the two groups.')
MANNWHITNEY [PRINT=test,ranks] Y1=E_Rats; Y2=C_Rats; R1=RE; R2=RC;\ 
            STATISTIC=U; PROBABILITY=Probability; SIGN=Sign
PRINT       U,Probability,Sign
&           RE,RC; DECIMALS=1; FIELD=7
Updated on January 12, 2022

Was this article helpful?