Census
We often want to find information about a particular group of individuals (people, fields, trees, bottles of beer or some other collection of items). This target group is called the population.
When measurements are made from every item in the target population, the collected data are called a census.
Sampling from the population
A census is often not feasible:
Fortunately, we can often obtain sufficiently accurate information by only measuring a selection of units from the population.
Data from a subset of the population is called a sample.
Simple random sample
The simplest way to select a representative sample from a population is called a simple random sample. In this, each unit has the same chance of being selected and some random mechanism (e.g. tossing a coin, rolling a die or a computer-based method) is used to determine whether any particular unit is included in the sample.
Although there is some inaccuracy when a sample is used instead of the whole population, the savings in cost and time often outweigh this.
Sampling from a population of values
When only a single measurement is made from each individual, it is convenient to define the population and sample to be sets of values (rather than people or other items). This abstraction — a population of values and a corresponding sample of values — can be applied to a wide range of applications.
In the remainder of this chapter, we examine the consequences of sampling from populations of numerical and categorical values.
Sampling people
The diagram below illustrates the sampling process with a population of 56 people.
Click the button Take sample to randomly select 15 of these people. Repeat a few times to observe the variability in the units sampled.
Although there are many differences between the individuals, we are often only interested in one. Click the checkbox Only show Gender to concentrate on this aspect of the individuals; the population is a set of categorical values (Male or Female) and the sample is similarly categorical.
Similar categorical populations and samples would arise if we were interested in whether the people were married, intended to vote for a particular candidate or were unemployed.
Sampling apples
The diagram below respresents the 120 apples that one fruit-picker collected from an orchard in an hour. The apples have a variety of shapes and sizes (represented by different colours in the diagram), and some are bruised or blemished (marked with a cross).
Click the button Take sample to randomly select 17 apples.
Click Only show Apple condition to concentrate on whether the apples are damaged. As with the previous example, this reduces the problem to random sampling from a population of categorical values (Bruised or OK).