Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).

### Options

`PRINT` = string tokens |
Output required (`test` , `ranks` , `hodgeslehmann` `confidence` ); default `test` |
---|---|

`METHOD` = string token |
Type of test required (`twosided` , `greaterthan` , `lessthan` ); default `twos` |

`GROUPS` = factor |
Defines the samples for a two-sample test if the `Y2` parameter is not set |

`CIPROBABILITY` = scalar |
Probability for the confidence interval for the median difference between the samples; default 0.95 |

`CONTROL` = scalar or text |
Identifies the control group against which to make comparisons if `GROUPS` is set; default uses the reference level of `GROUPS` |

### Parameters

`Y1` = variates |
Identifier of the variate holding the first sample if `Y2` is set, or both samples if `Y2` is unset (the `GROUPS` option must then also be set) |
---|---|

`Y2` = variates |
Identifier of the variate holding the second sample |

`R1` = variates |
Saves the ranks of the first sample if `Y2` is set, or both samples if `Y2` is unset |

`R2` = variates |
Saves the ranks of the second sample if `Y2` is set |

`STATISTIC` = scalars or tables |
Saves the test statistics U |

`PROBABILITY` = scalars or tables |
Probability values for the test statistics |

`SIGN` = scalars or tables |
Saves indicators: 1 if the first sample scores the highest ranks on average, 0 otherwise |

`HODGESLEHMANN` = scalars or tables |
Saves the Hodges-Lehmann estimates for the differences in location of the two samples (i.e. the median differences between the samples) |

`LOWER` = scalars or tables |
Saves lower confidence values for the samples Hodges-Lehmann estimates |

`UPPER` = scalars or tables |
Saves upper confidence values for the Hodges-Lehmann estimates |

### Description

The Mann-Whitney *U* test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters `Y1`

and `Y2`

. Alternatively, they can be stored in a single variate, supplied by `Y1`

, with the `GROUPS`

option set to a factor to identify which unit belongs to each sample. The `GROUPS`

option is ignored when the `Y2`

parameter is set. If `GROUPS`

has more than 2 levels, each group is compared against a control group. You can define which level (or label) of `GROUPS`

represents the control by setting the `CONTROL`

option to a scalar or text. If `CONTROL`

is not set, the reference level of `GROUPS`

is used.

`MANNWHITNEY`

calculates the test statistic *U*, along with its its associated probability value. An exact probability is calculated (using procedure `PRMANNWHITNEYU`

) if the size of either sample is less than 51 and the statistic *U* is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the `STATISTIC`

and `PROBABILITY`

parameters respectively. Parameter `SIGN`

holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. Usually `STATISTIC`

, `PROBABILITY`

and `SIGN`

will save scalars, but they will save tables classified by the `GROUPS`

factor when `GROUPS`

is set to a factor with more than two levels. The ranks (with respect to the combined data set) for each sample can be saved using the `R1`

and `R2`

parameters.

Printed output is controlled by the `PRINT`

option, with settings

`test` |
test statistic and probability, |
---|---|

`ranks` |
ranks (with respect to the whole data set) for each sample, |

`hodgeslehmann` |
Hodges-Lehmann estimate of the difference in the locations of the samples, with confidence limits, and |

`confidence` |
synonym of `hodgeslehmann` . |

The probability for the confidence limits is specified by the `CIPROBABILITY`

option; the default, of 0.95, gives a 95% interval. The calculation of the interval may be slow when there are ties amongst the values, as essentially `MANNWHITNEY`

then has to invert the probability function. The Hodges-Lehmann estimates can be saved by the `HODGESLEHMANN`

parameter. The lower and upper confidence values can be saved by the `LOWER`

and `UPPER`

parameters, respectively.

By default a two-sided test is done (to assess that samples are unequal) but the `METHOD`

option can be set to `greaterthan`

to test that the first sample is greater than the than the second, or `lessthan`

to test that it is smaller.

Options: `PRINT`

, `METHOD`

, `GROUPS`

, `CIPROBABILITY`

.

Parameters: `Y1`

, `Y2`

, `R1`

, `R2`

, `STATISTIC`

, `PROBABILITY`

, `SIGN`

, `HODGESLEHMANN`

, `LOWER`

, `UPPER`

.

### Method

The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.

The test statistic *U* is formed using ranks found from the combined data set, and is taken to be the smaller of *U*_{1} and *U*_{2}, where

*U _{k}* =

*n*

_{1}×

*n*

_{2}+

*n*× (

_{k}*n*+1) / 2 –

_{k}*R*;

_{k}*k*=1,2

and *n _{k}* is the size of sample

*k*,

*R*is the sum of ranks for sample

_{k}*k*. This score

*U*can be interpreted as the number of times a rank score in the other sample precedes a score in sample

_{k}*k*in the ranking. So the sample with the lowest score has, on average, smaller rank scores.

The `PRMANNWHITNEYU`

procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic *U* is less than 10000; otherwise a Normal approximation is used:

Normal = ( *n*_{1} × *n*_{2} / 2 – *U* ) / √{ *n*_{1} × *n*_{2} × ( *n*_{1}+*n*_{2}+1 ) / 12 }

If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:

√{ *n*_{1} × *n*_{2} / (*N* × (N-1)) × ( (*N*^{3}–*N*) / 12 – ∑_{k}*T _{k}* ) }

where *T _{k}* = (

*t*

_{k}^{3}–

*t*)/12 and

_{k}*t*is the number of observations with rank

_{k}*k*. (See for example Siegel 1956, pages 116-127.)

The Hodges-Lehmann estimate is calculated as the median of all the differences between pairs of units (with one unit from each sample)

### Action with `RESTRICT`

The variates `Y1`

and `Y2`

can be restricted, and in different ways. `MANNWHITNEY`

uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on `Y1`

and `GROUPS`

, allowing `RESTRICT`

to be used for example to limit the data to only two groups when the `GROUPS`

factor has more than two levels.

### Reference

Siegel, S. (1956). *Nonparametric Statistics for the Behavioural Sciences*. McGraw-Hill, New York.

### See also

Procedure: `PRMANNWHITNEYU`

, `SMANNWHITNEY`

, `SIGNTEST`

, `TTEST`

, `WILCOXON`

.

Commands for: Basic and nonparametric statistics.

### Example

CAPTION 'MANNWHITNEY example',\ !t('Data from Siegel (1956), Nonparametric Statistics,',\ 'p. 119. The behaviour of two groups of rats with different',\ 'backgrounds is studied and a score is assigned to each',\ 'individual.'); STYLE=meta,plain VARIATE [VALUES=78,64,75,45,82] E_Rats & [VALUES=110,70,53,51] C_Rats PRINT E_Rats,C_Rats; DECIMALS=0; FIELD=7 CAPTION !T('A Mann-Whitney U-test is performed to test for a',\ 'difference in scores between the two groups.') MANNWHITNEY [PRINT=test,ranks] Y1=E_Rats; Y2=C_Rats; R1=RE; R2=RC;\ STATISTIC=U; PROBABILITY=Probability; SIGN=Sign PRINT U,Probability,Sign & RE,RC; DECIMALS=1; FIELD=7