Variability

The mechanism of sampling from a population results in sample-to-sample variability in the information that we obtain from the samples.

Sample information about the population

However in practice, we only have a single sample that has been collected to provide information about the population. Sampling results in incomplete information about the population since we do not have information about some of the population members.

What information does a sample provide about the underlying population?

Effect of sample size

In later chapters, we will describe in much more detail how to use sample information to make inference about an underlying population. At this point, we simply note that we must take account of sample-to-sample variability when interpreting sample data and that the larger the sample size, the more information we have about the population.

Bigger samples mean more stable and reliable information about the underlying population.

School rolls

Consider a survey that records the number of students from a sample of New Zealand high schools. We will use the known rolls of all 363 New Zealand high schools in July 1994 to illustrate the kind of variability that is likely to be observed.

The diagram above shows a stacked dot plot, histogram and box plot for the rolls of a random sample of 10 schools from this population. Click on any cross on the dot plot to display the name and exact roll of that school.

Click Take sample a few times to observe the sample-to-sample variability in the three displays. With a sample size as low as 10, the sample distributions vary considerably. In some samples, there even appear to be outliers or clusters.

From a single small sample, there is a lot of uncertainty about the population distribution.

Use the pop-up menu to change the sample size to 40, then take a few more samples. Observe that the graphical displays now become less variable. Repeat with a sample size of 150 and observe that the overall features of the sample distribution change even less from sample to sample.

The bigger the sample size, the more consistently the sample distribution reflects the distribution in the underlying population.

Finally, use the pop-up menu to display the rolls of all 363 schools — the population distribution about which we are really interested.

Any of the samples of 150 schools give a close approximation to the population distribution of school sizes.

Even the samples of 40 schools mostly give a reasonable impression of the shape of the population distribution.