How much data do I need to collect?
In the previous page, we investigated how to determine the sample size needed to estimate a population mean to a specified accuracy. A similar calculation can be used to find the size of sample required for estimating a probability.
A 95% confidence interval for a probability π is of the form
If we want our estimate to be within k of π with probability 0.95, then we need n to be large enough that
In order to use this inequality, we need a guess at the value of p — it does not need to be particularly accurate.
A small pilot survey is often conducted to obtain a preliminary estimate for the proportion.
If we can do no better, the 'worst-case' value, p = 0.5 can be used, but the resulting sample size may be higher than needed.
The necessary sample size can be found by trial-and-error in the above inequality.
How many people should be phoned?
A hospital administrator wants to conduct a telephone survey to determine the proportion of people in a city who have visited a hospital in the last year, either as a patient or visitor.
What sample size is needed to be at least 95% confident that the resulting estimate will be within 0.04 of the true population proportion?
The following diagram helps with the calculations.
We have not been given a guess at the value of π, so drag the slider to 0.5 — the worst-case scenario. (Use the arrow keys on your keyboard for fine adjustment of π.)
Drag the sample size slider until the '±' value is less than 0.04. Verify that the sample size should be 625 or higher.
If we know (perhaps from a pilot survey or from other information obtained from the local hospitals) that the proportion will be no more than 0.10, the sample size can be reduced. Use the slider to change the guess of π to 0.10 in the diagram above and verify that a sample size of 225 would be enough to estimate π to within 0.04 with 95% confidence.
Obtaining the sample size by solving an equation
Trial-and-error can be avoided with a little algebra. The equation
can be re-written in the form