Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).
Options
PRINT = string tokens |
Output required (test , ranks , hodgeslehmann confidence ); default test |
---|---|
METHOD = string token |
Type of test required (twosided , greaterthan , lessthan ); default twos |
GROUPS = factor |
Defines the samples for a two-sample test if the Y2 parameter is not set |
CIPROBABILITY = scalar |
Probability for the confidence interval for the median difference between the samples; default 0.95 |
CONTROL = scalar or text |
Identifies the control group against which to make comparisons if GROUPS is set; default uses the reference level of GROUPS |
Parameters
Y1 = variates |
Identifier of the variate holding the first sample if Y2 is set, or both samples if Y2 is unset (the GROUPS option must then also be set) |
---|---|
Y2 = variates |
Identifier of the variate holding the second sample |
R1 = variates |
Saves the ranks of the first sample if Y2 is set, or both samples if Y2 is unset |
R2 = variates |
Saves the ranks of the second sample if Y2 is set |
STATISTIC = scalars or tables |
Saves the test statistics U |
PROBABILITY = scalars or tables |
Probability values for the test statistics |
SIGN = scalars or tables |
Saves indicators: 1 if the first sample scores the highest ranks on average, 0 otherwise |
HODGESLEHMANN = scalars or tables |
Saves the Hodges-Lehmann estimates for the differences in location of the two samples (i.e. the median differences between the samples) |
LOWER = scalars or tables |
Saves lower confidence values for the samples Hodges-Lehmann estimates |
UPPER = scalars or tables |
Saves upper confidence values for the Hodges-Lehmann estimates |
Description
The Mann-Whitney U test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters Y1
and Y2
. Alternatively, they can be stored in a single variate, supplied by Y1
, with the GROUPS
option set to a factor to identify which unit belongs to each sample. The GROUPS
option is ignored when the Y2
parameter is set. If GROUPS
has more than 2 levels, each group is compared against a control group. You can define which level (or label) of GROUPS
represents the control by setting the CONTROL
option to a scalar or text. If CONTROL
is not set, the reference level of GROUPS
is used.
MANNWHITNEY
calculates the test statistic U, along with its its associated probability value. An exact probability is calculated (using procedure PRMANNWHITNEYU
) if the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the STATISTIC
and PROBABILITY
parameters respectively. Parameter SIGN
holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. Usually STATISTIC
, PROBABILITY
and SIGN
will save scalars, but they will save tables classified by the GROUPS
factor when GROUPS
is set to a factor with more than two levels. The ranks (with respect to the combined data set) for each sample can be saved using the R1
and R2
parameters.
Printed output is controlled by the PRINT
option, with settings
test |
test statistic and probability, |
---|---|
ranks |
ranks (with respect to the whole data set) for each sample, |
hodgeslehmann |
Hodges-Lehmann estimate of the difference in the locations of the samples, with confidence limits, and |
confidence |
synonym of hodgeslehmann . |
The probability for the confidence limits is specified by the CIPROBABILITY
option; the default, of 0.95, gives a 95% interval. The calculation of the interval may be slow when there are ties amongst the values, as essentially MANNWHITNEY
then has to invert the probability function. The Hodges-Lehmann estimates can be saved by the HODGESLEHMANN
parameter. The lower and upper confidence values can be saved by the LOWER
and UPPER
parameters, respectively.
By default a two-sided test is done (to assess that samples are unequal) but the METHOD
option can be set to greaterthan
to test that the first sample is greater than the than the second, or lessthan
to test that it is smaller.
Options: PRINT
, METHOD
, GROUPS
, CIPROBABILITY
.
Parameters: Y1
, Y2
, R1
, R2
, STATISTIC
, PROBABILITY
, SIGN
, HODGESLEHMANN
, LOWER
, UPPER
.
Method
The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.
The test statistic U is formed using ranks found from the combined data set, and is taken to be the smaller of U1 and U2, where
Uk = n1 × n2 + nk × (nk+1) / 2 – Rk ; k=1,2
and nk is the size of sample k, Rk is the sum of ranks for sample k. This score Uk can be interpreted as the number of times a rank score in the other sample precedes a score in sample k in the ranking. So the sample with the lowest score has, on average, smaller rank scores.
The PRMANNWHITNEYU
procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used:
Normal = ( n1 × n2 / 2 – U ) / √{ n1 × n2 × ( n1+n2+1 ) / 12 }
If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:
√{ n1 × n2 / (N × (N-1)) × ( (N3–N) / 12 – ∑k Tk ) }
where Tk = ( tk3–tk )/12 and tk is the number of observations with rank k. (See for example Siegel 1956, pages 116-127.)
The Hodges-Lehmann estimate is calculated as the median of all the differences between pairs of units (with one unit from each sample)
Action with RESTRICT
The variates Y1
and Y2
can be restricted, and in different ways. MANNWHITNEY
uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1
and GROUPS
, allowing RESTRICT
to be used for example to limit the data to only two groups when the GROUPS
factor has more than two levels.
Reference
Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.
See also
Procedure: PRMANNWHITNEYU
, SMANNWHITNEY
, SIGNTEST
, TTEST
, WILCOXON
.
Commands for: Basic and nonparametric statistics.
Example
CAPTION 'MANNWHITNEY example',\ !t('Data from Siegel (1956), Nonparametric Statistics,',\ 'p. 119. The behaviour of two groups of rats with different',\ 'backgrounds is studied and a score is assigned to each',\ 'individual.'); STYLE=meta,plain VARIATE [VALUES=78,64,75,45,82] E_Rats & [VALUES=110,70,53,51] C_Rats PRINT E_Rats,C_Rats; DECIMALS=0; FIELD=7 CAPTION !T('A Mann-Whitney U-test is performed to test for a',\ 'difference in scores between the two groups.') MANNWHITNEY [PRINT=test,ranks] Y1=E_Rats; Y2=C_Rats; R1=RE; R2=RC;\ STATISTIC=U; PROBABILITY=Probability; SIGN=Sign PRINT U,Probability,Sign & RE,RC; DECIMALS=1; FIELD=7