SMANNWHITNEY procedure

Calculates the sample sizes for the Mann-Whitney test (R.W. Payne).

Options

`PRINT` = string token	What to print (`replication`, `power`); default `repl`, `powe`
`PROBABILITY` = scalar	Significance level at which the test is to be made; default 0.05
`POWER` = scalar	The required power (i.e. probability of detection) of the test; default 0.9
`TMETHOD` = string token	Whether to a one- or two-sided test is to be made (`onesided`, `twosided`); default `twos`
`RATIOREPLICATION` = scalar	Ratio of replication sample2:sample1 (i.e. the size of sample 2 should be `RATIOREPLICATION` times the size of sample 1); default 1
`REPLICATION` = variate	Sample sizes for which to calculate and print or save the power; default `*` takes 11 replication values centred around the required number of replicates

Parameters

`NULLPROBABILITIES` = variates	Probabilities under null hypothesis
`ODDSRATIO` = scalars	Odds ratio for test group vs. control
`NREPLICATES` = scalars	Saves the required sample size
`VREPLICATION` = variates	Sample sizes for which powers have been calculated
`VPOWER` = variates	Power (i.e. probability of detection) for the various numbers of replicates

Description

The Mann-Whitney U test is a nonparametric test for differences in location between two samples (see procedure MANNWHITNEY). This procedure, SMANNWHITNEY, allows you to calculate the sample sizes required for the test, provided you can supply some information about the probability distributions from which the samples are likely to be generated. For simplicity, the data are assumed to be classified into ordered categories. These may be natural categories (such as “very good”, “good”, “moderate” and “poor”) or they may be formed by splitting a continuous scale intervals (e.g. “under 18”, “18-25”, “25-40”, “40-60” and “over 60”). You then use the NULLPROBABILITIES parameter to specify a variate containing the probability value for each category. This indicates the probability distribution which you feel would generate the data of both samples under the null hypothesis. The accuracy of the subsequent calculations will depend on how many categories you take for a continuous variate. However, Whitehead (1993) suggests that there is little to gain in taking more than five.

To assess the power of the test, you next need to indicate how small a difference between the sample distributions the test should be able to detect. The assumption now is that there will be a control sample, with probability distribution as supplied, and a test sample for which the distribution is shifted by multiplying the odds (i.e. p/(1-p)) of the cumulative distribution by a constant amount. (This corresponds to the proportional-odds model of McCullagh 1980.) This constant is supplied by the ODDSRATIO parameter. An example, with odds-ratio 2, is show below.

Null hypothesis			Alternative hypothesis
probability	cumulative probability	odds	probability	cumulative probability	odds
0.20	0.20	0.25	0.33	0.33	0.50
0.40	0.60	1.50	0.42	0.75	3.00
0.30	0.90	9.00	0.20	0.95	18.00
0.10	1.00	*	0.05	1.00	*

The cumulative probabilities are produced as part of the information generated by setting the PRINT option to power. So you can evaluate possible ratios to check that they generate plausible distributions.

By default the calculations are done for a one-sided test, but you can set option TMETHOD=twosided for a two-sided test instead. The significance level for the test is specified by the PROBABILITY option (default 0.05 i.e. 5%). The required probability for detection of the change (that is, the power of the test) is specified by the POWER option (default 0.9). It is generally assumed that the sizes of the samples in the two-sample test should be equal. However, you can set the RATIOREPLICATION option to a scalar, R say, to indicate that the size of the second sample should be R times the size of the first sample. The sample size can be saved using the NREPLICATES parameter.

The PRINT option controls printed output, with settings:

`replication`	to print the required number of replicates in each sample (i.e. the size of each sample);
`power`	to print a table giving the power (i.e. probability of detection) provided by a range of numbers of replicates.

By default both are printed.

The replications and corresponding powers can also be saved, in variates, using the VREPLICATION and VPOWER parameters. The REPLICATION option can specify the replication values for which to calculate and print or save the power; if this is not set, the default is to take 11 replication values centred around the required number of replicates.

Options: PRINT, PROBABILITY, POWER, TMETHOD, RATIOREPLICATION, REPLICATION.

Parameters: NULLPROBABILITIES, ODDSRATIO, NREPLICATES, VREPLICATION, VPOWER.

Method

The method is based on the equations given by Whitehead (1993), except the Genstat implementation omits the approximation of taking n/(n+1) as equal to one.

References

McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society Series B, 43, 109-142.

Whitehead, J. (1993). Sample size calculations for ordered categorical data. Statistics in Medicine, 12, 2257-2271.

Example

CAPTION 'SMANNWHITNEY example',\
        !t('Example 2 of Whitehead (1993), but note that results below',\
        'differ slightly due to the use here of a more exact equation.');\
        STYLE=meta,plain
VARIATE [VALUES=0.2,0.5,0.2,0.1] Controlprob
SMANNWHITNEY [TMETHOD=twosided] Controlprob; ODDSRATIO=EXP(0.887)

Updated on June 18, 2019

Was this article helpful?

Yes No