Bivariate data: population or sample?

Some bivariate data sets are complete populations — there is no larger underlying population of which the data are representative. The 'individuals' in such data sets commonly have names or other labels that are an inherent part of the data.

More often, we have no interest in the specific individuals from which the data are collected. The individuals are 'representative' of a larger population or process, and our main interest is in this underlying population.

Pollution from jet aircraft

The scatterplot below shows NOx emissions from selected airplanes during a take-off/landing cycle and their weights.

There is a tendency for heavier airplanes to have higher NOx emissions. However our main interest is in the names of the airplanes that have high or low emissions. Click on the crosses to identify the planes.

Zinc levels in lake sediment and plants

The next data set describes data that were collected by biologists from 15 lakes in central Ontario to assess how zinc concentrations in the aquatic plant Eriocaulon septangulare (micrograms per gram dry weight) were related to zinc concentrations in the lake sediment (micrograms per gram).

The biologists are interested in discovering information about the relationship between zinc concentrations in sediments and plants. The names of the lakes from which the data were collected are available but we are not concerned with the specific lakes (or specific samples that were taken from the lakes). The biologists want to generalise from the data to describe the relationship in a way that might be used to predict plant zinc from sediment samples in other similar lakes.