Nonlinear transformations

There are several types of data where alternative units of measurement are not linearly related. For example,

Nonlinear transformations of the values in a data set have a more fundamental effect on the shape of the distribution, and this may be used to extract further information from the data.

Nonlinear transformation changes the shape of a distribution.

Logarithmic transformations

The most commonly used nonlinear transformation replaces each value by its logarithm,

new value   = log10 (old value)

We use base-10 logarithms in CAST since their values are easier to interpret, but natural logarithms (base e) have a similar effect on the distribution of values.

Effect on the shape of a distribution

Important properties of logarithms are

Consider four values 1, 10, 100 and 1000. The first two values are much closer to each other than the last two values. However their logarithms are 0, 1, 2 and 3, so their logarithms are evenly spaced out.

As a result, a logarithmic transformation selectively spreads out low values in a distribution and compresses high values. It is therefore useful for skew data with a long tail towards the high values. It will spread out a dense cluster of low values and may detect clustering or outliers that would not be visible in graphical displays of the original data.

World GDP

The dot plot below shows the Gross Domestic Product (GDP) of the 34 countries in the OECD (Organisation for Economic Cooperation and Development) in 2012.

Drag over the crosses to display the names of the countries.

The data set is so highly skewed that little can be determined about the distribution of GDP in the smaller OECD countries


The problem becomes worse when the GDP of all countries in the world is examined — a larger proportion of the countries have very low GDP that cannot be distinguished in an ordinary dot plot.

The diagram below also shows a jittered dot plot of the 2012 GDP of all countries in the world.

A second axis is drawn under the plot labelling the values 1, 10, 100, 1000 and 10,000. These values are not evenly spaced and the leftmost labels overlap. The axis above the dot plot shows the logarithms of these five values (0, 1, 2, 3, and 4).

Drag the slider under the diagram towards the right to change the display into a dot plot of the logarithms of the brain weights. The transformation turns the log axis above the plot into a conventionally spaced axis.

The transformation spreads out the dense cluster of countries with low GDP in the original plot and compresses the long tail of countries with high GDP.

From the transformed data, we might conclude that:



To help explain the transformation, the diagram below shows only six of the countries

Country GDP
USA 16,245
South Korea 1,129
Angola 116.3
Mauritius 11.45
Antigua and Barbuda 1.176
Nauru 0.121

These countries differ by approximately a factor of 10.

Again drag the slider to apply a logarithmic transformation. Observe that the four countries become evenly spaced on a log scale.