General idea of transformations

Sometimes it is convenient to express numbers on a different scale. For example, an American would easily recognise that a human body temperature of 102 degrees Fahrenheit indicates is unusually high, whereas in other countries temperatures are more easily 'understood' on the Celsius scale. This is called transformation.

Some transformations are performed for the convenience of the reader (such as the Fahrenheit to Celsius conversion above), but transformation can also be a useful tool that can help us understand a data set.

Linear transformations

Sometimes the values in a data set can be replaced by others holding exactly the same information. For example, a fisheries researcher might record the weights in grams of 28 trout that were caught in a particular river. If the weights had been recorded in imperial measurements (ounces), the data set would have contained equivalent information.

When the new values are found from the original data by an equation of the form

new value   =   a  +  b  ×  old value

it is called a linear transformation of the original values. A linear transformation can change the centre and spread of the data, but its shape otherwise remains unchanged. For graphical displays, only the numbers labelling the axis changes.

Linear transformation changes the centre and spread, but not the shape of a distribution.

Since the shape of the distribution is unaffected, linear transformations do not help you to understand the distribution of values in the data.

Weights of trout

Weights in ounces and grams are related by the equation

grams   =  28.3494 ×  ounces

The dot plot below shows the trout weights that a fisheries researcher recorded. The two axes allow the weights to be read off in grams and ounces — separate dot plots are not necessary.

Drag over the individual crosses to see the weight of the trout in grams and ounces.

Centre and spread

The centre and spread of linearly transformed data can be easily found from those of the original measurements. After a transformation of the form

new value   =   a  +  b  ×  old value

the mean (and other measures of centre such as the median) are similarly related

new mean   =   a  +  b  ×  old mean

The standard deviation (and other measures of spread that are expressed in the same units as the raw data, such as the inter-quartile range) are related with the equation.

new sd   =  |b|  ×  old sd

Note that if the scale factor, b, is negative, we must change its sign since the standard deviation must be positive.