Interest in comparing two groups
We often want to compare individuals (or other units) from two groups. If a numerical value is recorded from each individual, the resulting data consist of two batches of numbers — one from each group. Differences between the distributions of values in the two groups are often of interest.
'Individuals' | Measurement | Groups | Question |
---|---|---|---|
Plots in a field | Yield of wheat per sqr metre | Varieties A and B | Does either variety of wheat give higher yields? By how much? |
Dairy cows | Milk production in one week | Food supplement or standard feed | Do cows getting the supplement produce more milk on average? |
Wild ducks that have been shot in mid-summer | Weight | 2013 and 2014 | Is there any difference between duck weights in the two years? |
Questions are often about underlying populations
The questions in the above scenarios are not about the specific plots in the field, the specific cows whose milk production was measured, etc. They ask about the differences between the two wheat varieties in general, cows getting the supplement and standard feed in general, etc.
We are therefore usually interested in the characteristics of a population or process that we assume underlies the data that are collected. The data provide information about the likely characteristics of the population.
Examples
The diagram below shows a few data sets in which values are in two groups.
Note that the red questions do not refer to the specific individuals in the study, but ask about differences between the groups 'in general' — we would like to use the answers to predict what will happen to other individuals.