Interest in comparing two groups
We often want to compare individuals (or other units) from two groups. If a numerical value is recorded from each individual, the resulting data consist of two batches of numbers — one from each group. Differences between the distributions of values in the two groups are often of interest.
'Individuals' | Measurement | Groups | Question |
---|---|---|---|
Children aged 10 | Mark in a Maths test | Boys and girls | Are male marks higher on average than female marks? |
Plots in a field | Yield of wheat per sqr metre | Varieties A and B | Does either variety of wheat give higher yields? By how much? |
Cars leaving production line | CO emissions from exhaust | Production lines 1 and 2 | Are emissions the same in cars from both production lines? |
Questions are often about underlying populations
The questions in the above scenarios are not about the specific children who took the Maths test, the specific plots in the field, etc. They ask about the differences between 10-year-old boys and girls in general, the differences between the two wheat varieties in general, etc.
We are therefore usually interested in the characteristics of a population or process that we assume underlies the data that are collected. The data provide information about the likely characteristics of the population.
Examples
The diagram below shows a few data sets in which values are in two groups.
Note that the red questions do not refer to the specific individuals in the study, but ask about differences between the groups 'in general' — we would like to use the answers to predict what will happen to other individuals.