Paired data

The statistical methods for analysis of data depend strongly on the structure of the data and how the data were collected. We now consider how to analyse a different type of data.

When two types measurements, X and Y, are made from each individual (or other unit), the data are called bivariate. In many bivariate data sets, the two variables describe quantities on different scales, such as height and weight, but sometimes the two measurements are of more closely related quantities. The two measurements may even describe the same quantity at different times.

When the sum or difference of X and Y is a meaningful quantity, the data are called paired data.

With paired data, we could investigate the relationship between the variables, but we are often more interested in whether the means of the two variables are the same.

Pre-test, post-test data

To evaluate the effectiveness of a training exercise, it is common for individuals to sit similar tests before and after the training. There is usually considerable variation in the abilities of the participants, so the pre-test and post-test scores will be related. However it is or more interest to ask:

Has the mean score improved?

Data with a similar structure arise when measurement are made from individuals before and after any type of change (experimental or otherwise). In a study of the effect of the pill on blood pressure, the blood pressures of 15 college-aged women were recorded before they started taking the pill and after using it for 6 months.

  Blood pressure
Subject   Before pill   After pill
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
70
80
72
76
76
76
72
78
82
64
74
92
74
68
84
68
72
62
70
58
66
68
52
64
72
74
60
74
72
74

Has the mean blood pressure changed?

Twin studies

Many characteristics of individuals are determined by genetics, but many others are affected by their environment. There are therefore many studies of monozygous twins (genetically identical) who have been raised apart.

The table below shows the IQs of ten pairs of twins who were raised apart. In each pair, one twin had been raised in a 'good' environment and another in a 'poor' environment.

  IQ
Family   Poor environment   Good environment
1
2
3
4
5
6
7
8
9
10
100
65
60
125
85
145
55
180
60
135
125
95
100
120
120
185
80
210
105
175

The genetic influence on IQ is evident — when one twin has high IQ, the other often does too. However we can also ask...

Do the twins raised in a 'good' environment have a different mean IQ from those raised in a 'poor' environment?

Many experiments on animals also use identical twins. In each pair, one twin gets experimental treatment A and the other gets B.

Other paired data

The measurements may be paired by other mechanisms. A biologist studied a land snail (Ceaea nemoralis) whose shell occurs in various shades of yellow, pink and brown. Since the brown shells are more common in cooler areas, an experiment was conducted in which pairs of shells, one yellow and one brown, were exposed to sunlight side-by-side at the same time and in the same orientation. The temperature of each shell was recorded in degrees Celsius.

  Temperature
Pair   Yellow   Brown
1
2
3
4
5
6
7
8
9
10
25.6
27.8
26.3
25.9
28.0
25.4
24.6
28.9
27.2
26.0
25.5
27.5
27.3
27.3
29.2
25.3
26.4
28.5
28.1
26.4

The temperatures of the two shells in each pair seem to be related — some pairs were exposed to more sunlight than others. Of more interest is the question...

Is the mean temperature higher for the brown shells than for the yellow shells?

Many other data sets contain measurements that are paired in similar ways.

Hypotheses of interest

For paired data, the most interesting hypotheses relate to the means of the two variables, X and Y, and often we want to test whether they are equal.

H0 :   μX = μY
HA :   μXμY

Sometimes a one-tailed test is required, such as

H0 :   μX = μY
HA :   μX > μY

The null hypotheses in the examples above would be

Blood pressure and pill
H0 :   μbefore = μafter
IQ and environment
H0 :   μgood = μpoor
Temperature of snail shells
H0 :   μyellow = μbrown