Analysis of variation

In earlier pages of this section, we have treated unexplained variation in data as 'noise' — a nuisance that cannot be avoided and that only serves to complicate the analysis of data.

This is not totally true. Variation can be interesting in its own right, especially when we are interested in predicting the future.

Weather

The table below shows the maximum daily temperatures (in degrees Celsius) at Kabete, Kenya during one month.

Maximum daily temperatures in June 1997
June 1
June 2
June 3
June 4
June 5
June 6
June 7
June 8
June 9
June 10
20.5
21.1
20.1
19.9
21.3
20.3
20.6
22.8
22.8
22.0
June 11
June 12
June 13
June 14
June 15
June 16
June 17
June 18
June 19
June 20
21.8
22.8
20.0
21.8
21.7
21.5
19.8
21.9
20.0
21.1
June 21
June 22
June 23
June 24
June 25
June 26
June 27
June 28
June 29
June 30
19.5
20.6
21.9
21.5
21.7
21.1
22.3
21.5
19.5
18.3

From a data set such as this, we might get information about the chance that the temperature will be under 20.0 degrees in a future June day.


For countries that are susceptible to drought, the variability in past weather records might give information about:

This information can potentially help farmers decide which crops to plant and when to plant them.

Extreme values and outliers

There is often useful information in the extreme values in a data set — the highest and lowest values for each variable. If they are unusual enough, these extreme values are called outliers.

It is always worth investigating outliers further. They may be values that were wrongly recorded, but sometimes further investigation can give more important information.

An outlier can be the most important information that you get from a data set.


Crop yields

Consider data that have been collected about the yields of maize in a sample of farms in a region. The farm reporting the highest yield may be worth further investigation. Is its higher yield caused by:

Some of these findings could lead to improved yields elsewhere.