We will now find a 95% confidence interval for the median of a data set from a random sample, without making any assumptions about the shape of the underlying distribution. This will be done by inverting a test for the hypotheses
where the parameter \(\theta\) is the median of the distribution. This can be based on the number of values less than \(\theta_0\). If \(\theta_0\) really is the median (and H0 is true),
\[ Y \;\;=\;\; \text{number of values below }\theta_0 \;\;\sim\;\; \BinomDistn(n, \pi=0.5) \]The p-value for the test is the probability of \(Y\) being further from \(\diagfrac{n}{2}\) than was observed in the data.
A 95% confidence interval can be found (by trial-and-error) as the values of \(\theta_0\) that would result in the null hypothesis being accepted — i.e. with p-values greater than 0.05.
Question
A certain disease in dogs is characterized in the early stages by unusually high levels of a blood protein. This measurement has been proposed as a diagnostic test for infection: if the measured level is above a threshold value, the dog is diagnosed as having the disease. A ‘false positive’ occurs when a healthy dog happens to have a level above the threshold and is wrongly diagnosed as having the disease.
Measurements on a sample of 50 unaffected dogs gave the following results:
14.4 16.1 11.9 7.5 9.3 |
9.3 16.4 4.9 12.8 23.7 |
23.7 13.5 9.3 8.6 17.6 |
19.0 20.3 17.0 30.4 8.3 |
13.9 20.1 10.2 23.6 14.9 |
50.4 8.5 7.5 23.0 18.7 |
5.5 11.2 31.8 20.4 13.0 |
13.4 11.7 7.8 19.4 21.4 |
16.4 28.2 31.3 30.3 26.6 |
Measurements on a sample of 27 diseased dogs gave the results:
21.9 40.8 37.6 |
41.7 23.3 39.8 |
66.3 34.4 27.8 |
49.8 19.3 55.5 |
50.7 27.5 30.2 |
60.7 8.5 51.2 |
24.2 24.9 16.1 |
28.2 30.2 |
15.7 18.3 |
22.5 8.4 |
Find 95% confidence intervals for the median level of blood protein in the two groups.
(Solved in full version)