How much data do I need to collect?

A 95% confidence interval for a probability π is of the form

If we want our estimate to be within k of π with probability 0.95, then we need n to be large enough that

In order to use this inequality, we need a guess at the value of p — it does not need to be particularly accurate.

A small pilot survey is often conducted to obtain a preliminary estimate for the proportion.

If we can do no better, the 'worst-case' value, p = 0.5 can be used, but the resulting sample size may be higher than needed.

Equation for the sample size

The inequality can be re-written in the form

Example

To estimate a proportion with 95% confidence of being within 0.04 of the correct value, we need

Without a better guess at the value of p, we can use p = 0.5, giving a sample size of 625 or more. If we had a rough idea of the likely value of p, the sample size could be reduced from this worst-case value.