Signal and noise for a variable
When considering variation in one variable, the ideas of signal and noise are better explained in terms of explained and unexplained variation.
The ideas of explained and unexplained variation are appropriate to most data sets, whether they arise from experiments or surveys.
Data sets where all variation is unexplained
In some data sets, none of the variation in the variable of interest can be explained in terms of other variables that have been recorded.
Strength measurements
In an ergonomic study involving a group of 41 male students from the University of Hong Kong, each student was asked to exert maximum upward force on a horizontal bar which was close to floor level, with his feet 400mm away from the bar. The force, averaged over a 5-second period, is called the 'maximum voluntary isometric strength' (MVIS) and is shown below in kilograms.
MVIS (kg) | ||||
---|---|---|---|---|
33 16 35 33 47 40 18 54 18 |
44 21 29 12 12 26 10 12 |
20 31 12 19 36 23 22 20 |
15 15 16 20 13 25 26 41 |
14 20 22 19 18 23 26 19 |
There is considerable variation in these strength measurements. Although some of this variability will undoubtedly be associated with other physical characteristics of the students, no other measurements were taken from the students. For a statistical analysis of these data, we have no choice but to treat all variation in these data as 'unexplained'.
Data sets with explained and unexplained variation
In other data sets, some of the variation in the variable of interest can be explained in terms of other variables whose values are available, but part of the variation remains unexplained.
A statistical analysis often separates and describes these two components of the variation. Both provide useful information.
Experiment: Surface finish from lathe
A mechanical engineer is investigating the surface finish of metal parts produced on a lathe and its relationship to the speed (in RPM) of the lathe. Twenty parts were produced at different lathe speeds and using two different types of cutting tool (code numbers 302 and 416).
Surface finish | RPM | Type of cutting tool |
---|---|---|
45.44 42.03 50.10 48.75 47.92 47.79 52.26 50.52 45.58 44.78 33.50 31.23 37.52 37.13 34.70 33.92 32.13 35.47 33.49 32.29 |
225 200 250 245 235 237 265 259 221 218 224 212 248 260 243 238 224 251 232 216 |
302 302 302 302 302 302 302 302 302 302 416 416 416 416 416 416 416 416 416 416 |
A large part of the variability in surface finish can be explained by differences in RPM and the cutting tool that was used. However some variability in surface finish remains that cannot be explained by these variables — unexplained variation.
Non-experimental data: Alcoholism and strength
Data that were obtained from 50 alcoholic men who were selected from a larger group of alcoholics to be as similar as possible in age and social characteristics. The researchers estimated the total lifetime alcohol consumption (kg per kg body weight) of each individual and measured the strength of a muscle (kg) in that individual's non-dominant arm.
Alcohol | Strength | Alcohol | Strength | Alcohol | Strength | |||||
---|---|---|---|---|---|---|---|---|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
3.5 4.0 5.2 5.2 7.4 9.4 9.7 9.8 10.8 11.1 11.7 12.5 12.6 13.2 13.5 13.7 14.0 |
22.3 20.9 20.9 28.2 29.5 28.2 23.9 22.1 25.1 24.0 20.9 20.9 26.2 15.5 28.4 20.9 21.8 |
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
14.0 14.8 15.7 17.4 17.5 17.7 18.2 18.3 18.9 19.1 19.1 19.7 20.0 22.6 22.8 27.7 28.3 |
25.1 15.5 20.9 20.9 25.1 19.1 12.2 22.2 21.1 17.9 28.2 22.2 21.1 26.3 18.8 18.2 16.2 |
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
28.3 28.6 29.2 29.8 30.8 32.3 32.5 32.9 34.5 34.5 36.2 39.5 39.7 39.7 40.3 40.8 |
19.3 15.2 13.1 23.3 15.2 21.2 14.0 24.1 15.2 16.2 10.0 10.8 10.0 15.5 16.2 18.2 |
(Note that the data matrix has 50 rows and 2 columns — it has been split here to fit better on the screen. Note also that the rows of the data matrix — individuals — have been sorted into order of lifetime alcohol consumption. Any ordering of the individuals is equally valid.)
Some of the variation in strength can be explained by the different amounts of alcohol consumed by these men — there is a tendency for those with low alcohol consumption to be stronger. However there is a lot of variation in strength that cannot be explained by differences in alcohol consumption.