Paired data

The statistical methods for analysis of data depend strongly on the structure of the data and how the data were collected. We now consider how to analyse a different type of data.

When two types measurements, X and Y, are made from each individual (or other unit), the data are called bivariate. In many bivariate data sets, the two variables describe quantities on different scales, such as height and weight, but sometimes the two measurements are of more closely related quantities. The two measurements may even describe the same quantity at different times.

When the sum or difference of X and Y is a meaningful quantity, the data are called paired data.

With paired data, we could investigate the relationship between the variables, but we are often more interested in whether the means of the two variables are the same.

Pre-test, post-test data

To evaluate the effectiveness of a training exercise, it is common for individuals to sit similar tests before and after the training. There is usually considerable variation in the abilities of the participants, so the pre-test and post-test scores will be related. However it is or more interest to ask:

Has the mean score improved?

Data with a similar structure arise when measurement are made from individuals before and after any type of change (experimental or otherwise). In a warehouse, the employees have asked management to play music to relieve the boredom of the job. The manager wants to know whether efficiency is affected by the change. The table below gives efficiency ratings of 15 employees recorded before and after the music system was installed.

  Efficiency rating   Efficiency rating
Employee   Before   After Employee   Before   After
1
2
3
4
5
6
7
8
21
35
40
38
23
27
28
39
32
35
38
57
37
30
39
28
9
10
11
12
13
14
15
22
35
28
20
39
28
34
40
48
33
33
39
41
40

Has efficiency changed?

Twin studies

Many characteristics of individuals are determined by genetics, but many others are affected by their environment. There are therefore many studies of monozygous twins (genetically identical) who have been raised apart.

The table below shows the IQs of ten pairs of twins who were raised apart. In each pair, one twin had been raised in a 'good' environment and another in a 'poor' environment.

  IQ
Family   Poor environment   Good environment
1
2
3
4
5
6
7
8
9
10
100
65
60
125
85
145
55
180
60
135
125
95
100
120
120
185
80
210
105
175

The genetic influence on IQ is evident — when one twin has high IQ, the other often does too. However we can also ask...

Do the twins raised in a 'good' environment have a different mean IQ from those raised in a 'poor' environment?

Although twin studies are uncommon in business research, the 'individuals' under investigation (employees, retail outlets, creditors, ...) are often grouped into pairs that are as similar as possible before an experiment is conducted. For example, a chain of fast-food outlets is researching which of two new types of hamburger will be more popular. Pairs of outlets are selected that have similar sizes and turnovers and are in areas with similar socio-economic status. The two new hamburgers would each be trialled in one of the outlets in each pair.

Other paired data

The measurements may be paired by other mechanisms. An insurance company is concerned that garage A is charging too much for repairing damage to cars. Ten damaged cars were taken to both garage A and another garage for estimates. The table below shows the estimates for repairing the cars (in dollars).

  Repair estimate
Car   Garage A   Garage B
1
2
3
4
5
6
7
8
9
10
420
900
1260
630
240
1080
1460
1900
2020
1520
380
760
1180
560
260
1000
1300
1720
1800
1440

The estimates from the two garages are clearly related — some cars are more badly damaged than others. Of more interest is the question...

Is the mean estimate higher for garage A than for garage B?

Many other data sets contain measurements that are paired in similar ways.

Hypotheses of interest

For paired data, the most interesting hypotheses relate to the means of the two variables, X and Y, and often we want to test whether they are equal.

H0 :   μX = μY
HA :   μXμY

Sometimes a one-tailed test is required, such as

H0 :   μX = μY
HA :   μX > μY

The null hypotheses in the examples above would be

Effect of music on efficiency
H0 :   μbefore = μafter
IQ and environment
H0 :   μgood = μpoor
Repair estimate and garage
H0 :   μgarage A = μgarage B