Sampling mechanism
The mechanism of sampling from a population explains randomness in data.
However, in practice, there is only a single sample and we must use it to give information about the population. The population is the focus of our attention — we are rarely interested in the specific individuals in our sample and the underlying population is a generalisation of this type of 'individual'.
Parameters and statistics
Instead of trying to fully estimate the population distribution, we usually focus attention on a small number of numerical characteristics — often only one. Such population characteristics are called parameters. The corresponding values from a sample are called sample statistics and provide estimates of the unknown parameters.
The population mean is often of particular interest and the sample mean provides an estimate of it.
Variability of sample statistics
The variability in random samples also implies sample-to-sample variability in sample statistics.
In order to assess how well a sample statistic estimates an unknown population parameter, it is important to understand its sample-to-sample variability.
The remainder of this section investigates the variability in sample means.
Survival time of businesses
A researcher wishes to investigate the characteristics of the 120 businesses that closed down over the previous year in a small city. The top half of the following diagram shows the times (in years) that these businesses had been in operation. These 120 values are the population of interest, and their mean and standard deviation are also shown in the diagram; they are population parameters.
To save the cost of finding information from all 120 businesses, the researcher decides to randomly select 20 of them (without replacement). Click the button Take sample to select a random sample. The sample mean could provide an estimate of the mean years in business of all of the failed businesses (if the population was unknown as it would be in practice).
Observe that the sample mean and standard deviation are similar to those of the population but they are not identical. Select a few more samples and note the variability in the sample statistics.
Any single sample mean provides a reasonable estimate of the population mean but the sample-to-sample variability affects its accuracy.