Data that are not sampled from a finite population
Sometimes data are actually sampled from a real finite population. For example, a public opinion poll may select individuals from the population of all residents in a city. The previous section showed that:
Random sampling of values from a finite population can explain the sample-to-sample variability of some data.
However there is no real finite population underlying most data sets from which the values can be treated as being sampled. The randomness in such data must be explained in a different way.
Silkworm poisoning
In an investigation of the speed of the toxic action of arsenic on silkworm larvae, 80 fourth-instar silkworm larvae weighing between 0.41 and 0.45 grams were given 0.10 mg of sodium arsenate per gram of body weight. Their survival times in seconds are given below.
270 254 293 244 293 261 285 330 284 274 |
307 235 215 292 309 267 275 298 241 254 |
256 275 226 287 280 339 294 298 283 366 |
300 310 280 240 291 286 230 285 218 279 |
280 286 345 289 210 282 260 228 243 259 |
285 275 280 296 283 248 314 258 215 299 |
240 241 236 255 267 271 253 271 233 260 |
273 233 271 267 258 319 310 302 260 251 |
There is no real finite population from which the survival times can be considered to be sampled. However there is variability within this data set and repeating the experiment would have resulted in a different set of survival times.
Sampling from an abstract population
Random sampling from a population is such an intuitive way to explain sample-to-sample variability, we also use it to explain variability even when there is no real population from which the data were sampled.
We replace the real population that usually underlies survey data with an abstract population of all values that might have been obtained if the data collection had been repeated. We can then treat the observed data as a random sample from this abstract population.
The variation in the underlying abstract population gives us information about the variation in similar data in general.
Defining such an underlying population therefore not only explains sample-to-sample variability but also gives us a focus for generalising from our specific data.
Silkworm poisoning
It is convenient to model the variability in his data as being a sample from the infinite population of all possible measurements that could have been made from similar silkworm larvae. The variability in this hypothetical population reflects the variability in the survival times.
The distribution of survival times in the sample of poisoned silkworms provides information about the distribution of this underlying population — the distribution of survival times of silkworms given 0.10 mg of sodium arsenate per gram of body weight in general.