Data that are not sampled from a finite population
Sometimes data are actually sampled from a real finite population. For example, a public opinion poll may select individuals from the population of all residents in a city. The previous section showed that:
Random sampling of values from a finite population can explain the sample-to-sample variability of some data.
However there is no real finite population underlying most data sets from which the values can be treated as being sampled. The randomness in such data must be explained in a different way.
Estimating the speed of light
A scientist, Simon Newcomb, made a series of measurements of the speed of light between July and September 1882. He measured the time in nanoseconds (1/1,000,000,000 seconds) that a light signal took to pass from his laboratory on the Potomac River to a mirror at the base of the Washington Monument and back, a total distance of 7442 metres. Since all his measurements (24828, 24826, ...) were close to 24800, they have been coded in the table below as (24828-24800 = 28, 24826-24800 = 26, ...)
28 26 33 24 34 -44 27 16 40 -2 29 |
22 24 21 25 30 23 29 31 19 24 20 |
36 32 36 28 25 21 28 29 37 25 28 |
26 30 32 36 26 30 22 36 23 27 27 |
28 27 31 27 26 33 26 32 32 24 39 |
28 24 25 32 25 29 27 28 29 16 23 |
Newcomb's measurements cannot be considered to be sampled from any real finite population. However there is variability within this data set that reflects inaccuracies in his experimental procedure. Repeating his experiment would have resulted in a different set of measurements.
Sampling from an abstract population
Random sampling from a population is such an intuitive way to explain sample-to-sample variability, we also use it to explain variability even when there is no real population from which the data were sampled.
We replace the real population that usually underlies survey data with an abstract population of all values that might have been obtained if the data collection had been repeated. We can then treat the observed data as a random sample from this abstract population.
The variation in the underlying abstract population gives us information about the variation in similar data in general.
Defining such an underlying population therefore not only explains sample-to-sample variability but also gives us a focus for generalising from our specific data.
Estimating the speed of light
Newcomb's data can be treated as a sample from the population of all possible measurements that could have been made by repeating the experiment an infinite number of times.
The variability in this abstract population reflects the variability in Newcomb's experimental technique. The desire to generalise from Newcomb's specific 66 measurements can therefore be translated into estimation of characteristics of the underlying population (and hence the true speed of light).
Newcomb's data can be treated as a random sample from this population and they provide information about the distribution of values in it.