Nonlinear transformations
There are several types of data where alternative units of measurement are not linearly related. For example,
Nonlinear transformations of the values in a data set have a more fundamental effect on the shape of the distribution, and this may be used to extract further information from the data.
Nonlinear transformation changes the shape of a distribution.
Logarithmic transformations
The most commonly used nonlinear transformation replaces each value by its logarithm,
new value = log10 (old value)
We use base-10 logarithms in CAST since their values are easier to interpret, but natural logarithms (base e) have a similar effect on the distribution of values.
Effect on the shape of a distribution
Important properties of logarithms are
Consider four values 1, 10, 100 and 1000. The first two values are much closer to each other than the last two values. However their logarithms are 0, 1, 2 and 3, so their logarithms are evenly spaced out.
As a result, a logarithmic transformation selectively spreads out low values in a distribution and compresses high values. It is therefore useful for skew data with a long tail towards the high values. It will spread out a dense cluster of low values and may detect clustering or outliers that would not be visible in graphical displays of the original data.
Mammal brain weights
The dot plot below shows the average brain weights (grams) of 62 species of mammals.
Drag over the crosses to display the names of the mammals.
The data set is so highly skewed that little can be determined about the distribution for small mammals.
The diagram below also shows a jittered dot plot of the data.
A second axis is drawn under the plot labelling the values 0.1, 1, 10, 100, 1000 and 10,000. These values are not evenly spaced and the leftmost labels overlap. The axis above the dot plot shows the logarithms of these six values (-1, 0, 1, 2, 3 and 4).
Drag the slider under the diagram towards the right to change the display into a dot plot of the logarithms of the brain weights. The transformation turns the log axis above the plot into a conventionally spaced axis.
The transformation...
From the transformed data, we might conclude that:
To help explain the transformation, the diagram below shows only five of the mammals
Mammal | Brain weight |
---|---|
Asian elephant | 4603 g |
Chimpanzee | 440 g |
Arctic fox | 44.5 g |
Ground squirrel | 4 g |
Mouse | 0.4 g |
These mammals differ by approximately a factor of 10.
Again drag the slider to apply a logarithmic transformation. Observe that the five mammals become evenly spaced on a log scale.