A family of nonlinear transformations
Many data sets that arise in practice involve quantities that have skew distributions. A logarithmic transformation may remove skewness, but sometimes a more flexible class of transformations is needed.
A group of transformations called power transformations is often used. A power transformation raises each value in the data set to a power p, where p is usually some constant between -2 and 2. Common examples are given in the table below.
Although these values of p are most easily interpreted, intermediate values can also be used.
(Note that any value, x, raised to the power 0 is 1.0, so it initially seems that p = 0 would not give a useful transformation. However when p becomes close to 0, the effect is similar to a log transformation.)
African populations
The dot plot below describes the populations of all countries in Africa in 2013. Drag over the crosses to investigate the outliers. We will use power transformations of the populations to spread out the countries with lower populations and reduce the visual impact of the outlier.
Drag the vertical red line on the axis towards the right. This reduces the power used in the transformation from its initial value of p = 1. After clicking on the axis, the arrow keys on your keyboard may also be used for finer adjustment of p. Note that ...
We labelled the axis with the transformed values to help explain the mechanics of power transformations. In practice, it is better to label the axis with the original measurements. Select the option Raw Values from the pop-up menu. The labels on the axis become the populations of the countries.
Adjust the power again and observe the effect on the labels. Note the smooth transition between the logarithmic transformation and the powers on either side.
For these data, a logarithmic transformation is again close to best for removing skewness. Observe that the 'outlier', Nigeria, no longer stands out from the rest of the countries — it is consistent with distribution of populations in the rest of Africa.
Precise definition
Power transformations are flexible enough to reduce or eliminate the skewness in a wide range of data sets.
The family of transformations that we use is:
Note that if we did not change the sign of the values when p < 0, the order of the values would be swapped. For example,
Changing the sign keeps the original ordering.