Distribution of proportion
In a random sample from a categorical population with probability π of success, the number of successes, x , has a binomial distribution,
X ~ binomial (n, π)
The sample proportion, p = x / n, has a distribution with the same shape but scaled by a factor 1/n. From the properties of the binomial distribution, its distribution has mean and standard deviation
μp = π
σp = | ![]() |
Distribution of estimation error
When the proportion p is used to estimate π, the estimation error is p - π. The error distribution therefore has the same shape as that of p, but is shifted to have mean zero. The bias and standard error of the sample proportion are therefore
bias = μerror = 0
standard error = σerror = | ![]() |
Standard error from data
Unfortunately, the formula for the standard error of p involves π, and this is unknown in practical problems. To get a numerical value for the standard error, we therefore replace π with our best estimate of its value, p .
bias = μerror = 0
standard error = σerror = | ![]() |
Survival of fruit flies on heat-treated mangoes
The Queensland fruit fly, Bactrocera tryoni, can lay eggs in mangoes, so Australian mangoes must be treated before they can be exported to most international markets.
An experiment was conducted to determine the effectiveness of heat treatment of mangoes to kill fruit fly eggs. The table below shows the published results when mangoes containing 5,903 eggs were heat treated to a core temperature of 43 degrees Celsius.
Surviving adults | 637 |
---|---|
Eggs killed | 5,266 |
Total eggs | 5,903 |
What is the probability that a fruit fly egg will survive?
There is some underlying probability, π, that an egg will survive the heat treatment and our best estimate is the sample proportion, p = 637/5903 = 0.1079.
How accurate is this estimate?
The number surviving should have a binomial distribution,
X ~ binomial (n = 5903, π)
The diagram below initially shows this distribution with π replaced by our best estimate, p = 0.1079.
Use the pop-up menu to display the (approximate) distributions of the sample proportion, p, and the estimation error. Observe that all three distributions have the same basic shape — only the scale on the axis changes.
From the error distribution (or from the standard error), it is unlikely that the estimate of survival, pâ =â 0.1079, will be more than 0.01 in error.