Notation
We now generalise the telepathy example on the previous page. Consider an infinite categorical population that contains a proportion π of some category that we will call 'success'. We call the other values in the population 'failures'.
In the telepathy example, a correct guess might be called a 'success' and a wrong guess would be a 'failure'. The probability of success is π = 0.333.
The labels 'success' and 'failure' provide terminology that can describe a wide range of data sets. For example,
Data set | 'Success' | 'Failure' |
---|---|---|
Sex of a sample of fish | female | male |
Quality of export apples | good | bruised |
Effect of insecticide on beetle | dead | alive |
When a random sample of n values is selected from such a population, we denote the number of successes by x and the proportion of successes by p = x/n.
Distribution of a proportion from a simple random sample
The number of successes, x , has a 'standard' discrete distribution called a binomial distribution which has two parameters, n and π. In practical applications, n is a known constant, but π may be unknown. The sample proportion, p , has a distribution with the same shape, but is scaled by n .
With appropriate choice of the parameters n and π, the binomial distribution can describe the distribution of any proportion from a random sample.
Shape of the binomial distribution
The diagram below shows some possible shapes of the binomial distribution. The barchart has dual axes and therefore shows the distributions of both x and p.
Drag the sliders to adjust the two parameters of the binomial distribution. Observe that
The diagram can be used to obtain binomial probabilities by setting π and n to the appropriate values, then clicking on one of the bars in the barchart.
Telepathy experiment
For example, to find the probability of a subject guessing correctly 4 out of 5 cards in the telepathy example, set π = 0.33 and n = 5, then click on the bar for x = 4. The probability is shown under the barchart.
The diagram below demonstrates that a binomial distribution does indeed describe sample-to-sample variability. The pink barchart at the bottom of the diagram shows the binomial distribution with parameters n = 20 and π = 0.333 that describes the distribution of the sample proportion of correct guesses from n = 20 guesses.
Click Accumulate and take several samples. Observe that the distribution of x matches the theoretical binomial distribution. Repeat the exercise with different sample sizes.