We will now find a 95% confidence interval for the median of a data set from a random sample, without making any assumptions about the shape of the underlying distribution. This will be done by inverting a test for the hypotheses

where the parameter \(\theta\) is the median of the distribution. This can be based on the number of values less than \(\theta_0\). If \(\theta_0\) really is the median (and H0 is true),

\[ Y \;\;=\;\; \text{number of values below }\theta_0 \;\;\sim\;\; \BinomDistn(n, \pi=0.5) \]

The p-value for the test is the probability of \(Y\) being further from \(\diagfrac{n}{2}\) than was observed in the data.

A 95% confidence interval can be found (by trial-and-error) as the values of \(\theta_0\) that would result in the null hypothesis being accepted — i.e. with p-values greater than 0.05.

Question

A certain disease in dogs is characterized in the early stages by unusually high levels of a blood protein. This measurement has been proposed as a diagnostic test for infection: if the measured level is above a threshold value, the dog is diagnosed as having the disease. A ‘false positive’ occurs when a healthy dog happens to have a level above the threshold and is wrongly diagnosed as having the disease.

Measurements on a sample of 50 unaffected dogs gave the following results:

14.4
16.1
11.9
7.5
9.3
9.3
16.4
4.9
12.8
23.7
23.7
13.5
9.3
8.6
17.6
19.0
20.3
17.0
30.4
8.3
13.9
20.1
10.2
23.6
14.9
50.4
8.5
7.5
23.0
18.7
5.5
11.2
31.8
20.4
13.0
13.4
11.7
7.8
19.4
21.4
16.4
28.2
31.3
30.3
26.6

Measurements on a sample of 27 diseased dogs gave the results:

21.9
40.8
37.6
41.7
23.3
39.8
66.3
34.4
27.8
49.8
19.3
55.5
50.7
27.5
30.2
60.7
8.5
51.2
24.2
24.9
16.1
28.2
30.2
 
15.7
18.3
 
22.5
8.4
 

Find 95% confidence intervals for the median level of blood protein in the two groups.

(Solved in full version)