Having some knowledge about the behaviour of the sampling distribution enables us to make decisions concerning the population proportion π based on sample data.
Consider the following example. The Herald-Sun newspaper published the following article on November 25 1992, which reported on doubts raised by the results of a study carried out by a journalist concerning the proportion of letters delivered on time.
Doubt has been cast over Australia Posts claim of delivering 96 per cent of standard letters on time.
A survey conducted by the Herald-Sun in Melbourne revealed that less than 90 per cent of letters were delivered according to the schedule.
Herald-Sun staff posted 59 letters before the advertised
Campbell Fuller, Herald-Sun, 25 November 1992.
Is the author justified in disputing Australia Posts claim that 96% of letters are delivered on time?
If Australia Post's claim is correct, and every letter posted independently has probability 0.96 of being delivered on time, we know that the number delivered on time out of 59 letters will be a random quantity. From the information in the article, we can deduce that 52 out of the Herald-Sun's 59 letters arrived on time (a proportion 52/59 = 0.881).
How unlikely is it to get only 52 out of 59 letters arriving on time if Australia Post's claim that the probability of letters arriving on time is 0.96 is correct?
We will use a simulation to help answer this question.
Click Take sample a few times to observe the sample-to-sample variability of the number of letters arriving on time.
Now click Accumulate and take between 100 and 200 samples. Observe the distribution of the number of letters arriving on time. The proportion of runs of the simulation that gave 52 or fewer letters arriving on time is shown under the dot plot. Observe that this rarely happens.
We therefore conclude that the article is justified — only 52 letters being delivered on time is most unlikely if all letters independently have probability 0.96 of being delivered on time.
Display the simulation results as a histogram (choose Histrogram from the pop-up menu on the bottom left of the diagram).
We will now fit a normal curve to this distribution. Select Simulation and Normal from the pop-up menu on the right of the diagram. Use the two sliders to adjust the normal parameters to match the normal distribution to the histogram.
The button Best (Simulation) sets the normal parameters to the mean and standard deviation of the distribution of simulated counts.
We can do better with a little probability theory! The button Best (Model) sets the normal mean and standard deviation to values that can be determined theoretically from the probability 0.96 of letters being delivered on time.
(The probability of getting 52 or fewer letters delivered on time can also be determined from this normal distribution — it is given below the sliders for the parameters. However since the distribution is actually binomial and the normal distribution is only an approximation, the probability is not particularly accurate in the tails of the distribution.)
The diagram below gives a little more flexibility to the simulation.
Use the slider at the top left to reduce the probability of each letter arriving on time to 0.92, then repeat the simulation. Is the paper's findings (52 out of 59 arriving on time) consistent with this probability?
We will now change to a different hypothetical scenario — 188 out of 200 letters being found to arrive on time in a different city. Is this consistent with a probability 0.96 of letters arriving on time?
Use the pop-up menu to change the number of letters to 200 and reset the probability to 0.96. Now repeat the simulation. Are these results consistent with a probability 0.96 of letters arriving on time?