Sampling mechanism

The mechanism of sampling from a population explains randomness in data.

However, in practice, there is only a single sample and we must use it to give information about the population. The population is the focus of our attention — we are rarely interested in the specific individuals in our sample and the underlying population is a generalisation of this type of 'individual'.

Parameters and statistics

Instead of trying to fully estimate the population distribution, we usually focus attention on a small number of numerical characteristics — often only one. Such population characteristics are called parameters. The corresponding values from a sample are called sample statistics and provide estimates of the unknown parameters.

The population mean is often of particular interest and the sample mean provides an estimate of it.

Variability of sample statistics

The variability in random samples also implies sample-to-sample variability in sample statistics.

In order to assess how well a sample statistic estimates an unknown population parameter, it is important to understand its sample-to-sample variability.

The remainder of this section investigates the variability in sample means.


Wheat yields

The top half of the following diagram shows wheat yields (tonnes per hectare) from all 60 farms in one region of the United States in 2014. These 60 values are the population of interest and their mean and standard deviation are population parameters. The top half of the diagram below shows this population.

To save the cost of measuring the wheat yields in all 60 farms, a researcher decides to randomly select 12 of them (without replacement). Click the button Take sample to select a random sample. The sample mean could provide an estimate of the wheat yield in the whole region (if the population was unknown as it would be in practice).

Observe that the sample mean and standard deviation are similar to those of the population but they are not identical. Select a few more samples and note the variability in the sample statistics.

Any single sample mean provides a reasonable estimate of the population mean but the sample-to-sample variability affects its accuracy.