A sample proportion has a distribution
If a categorical data set is modelled as a random sample from a categorical population, the sample proportions in the various categories must be treated as random quantities — they vary from sample to sample.
The population proportion in any category of a categorical population is called the category's probability, and the Greek letter π is often used to denote the probability of a particular category of interest. The corresponding sample proportion is usually denoted by p.
Sample Statistic | Population Parameter | |
---|---|---|
Mean | ![]() |
µ |
Standard deviation | s | σ |
Proportion/probability | p | π |
Note carefully that...
In statistics, the symbol π is used to represent a probability that may take any value between 0 and 1, depending on context. Do not confuse it with the mathematical constant π.
It is important that you understand the distinction between a sample proportion and the underlying population probability.
Sex of babies
Consider the sex of a newborn baby at a maternity unit. We can model the baby's sex as a categorical value (male or female) from a hypothetical infinite population of 51.2% male and 48.8% female values. (These population proportions are obtained from historical records of births.)
The sexes of 10 babies born in one day at the maternity unit would be modelled as a random sample of n = 10 values from this population.
Click Take sample a few times to observe the variability in samples from this model. In particular, observe that the sample proportion of male babies varies from sample to sample.
Unknown probabilities
In some applications, we know the population probabilities for the categories of interest, but usually these values are unknown. (In practice, population parameters are usually unknown constants.) The corresponding sample proportions are approximations to these probabilities, but it is important to recognise that the underlying probabilities are unknown.
Effect of insecticide on beetles
Fifty beetles were sprayed with a weak concentration of insecticide. The symbol π denotes the probability of a beetle dying. The diagram below shows the result of the experiment.
The unknown parameter π is of greatest interest, but we only know the sample proportion dying, p = 0.72, which throws some light on the likely value of π.
Understanding the sample-to-sample variability of a proportion allow us to assess the proportion that is observed in a single observed data set.