General idea of transformations

Sometimes it is convenient to express numbers on a different scale. For example, an American would easily recognise that an air temperature of 90 degrees Fahrenheit is hot, whereas in other countries temperatures are more easily 'understood' on the Celsius scale. This is called transformation.

Some transformations are performed for the convenience of the reader (such as the Fahrenheit to Celsius conversion above), but transformation can also be a useful tool that can help us understand a data set.

Linear transformations

Sometimes the values in a data set can be replaced by others holding exactly the same information. For example, a taxi operator might record the distances in kilometres travelled by each of the company's 30 taxis on a particular day. If the distances had been recorded in miles, the data set would have contained equivalent information.

When the new values are found from the original data by an equation of the form

new value   =   a  +  b  ×  old value

it is called a linear transformation of the original values. A linear transformation can change the centre and spread of the data, but its shape otherwise remains unchanged. For graphical displays, only the numbers labelling the axis changes.

Linear transformation changes the centre and spread, but not the shape of a distribution.

Since the shape of the distribution is unaffected, linear transformations do not help you to understand the distribution of values in the data.

Distances travelled by taxis

A taxi operator records the distances in kilometres travelled by each of the company's 30 taxis on a particular day. Distances in kilometres and miles are related by the equation

miles   =  0.6214 ×  kilometers

The dot plot below shows the distances travelled by each taxi in a taxi company during a single day. The two axes allow the distances to be read off in kilometers and miles — separate dot plots are not necessary.

Drag over the individual crosses to see the distances in miles and kilometres travelled by each taxi.

Centre and spread

The centre and spread of linearly transformed data can be easily found from those of the original measurements. After a transformation of the form

new value   =   a  +  b  ×  old value

the mean (and other measures of centre such as the median) are similarly related

new mean   =   a  +  b  ×  old mean

The standard deviation (and other measures of spread that are expressed in the same units as the raw data, such as the inter-quartile range) are related with the equation.

new sd   =  |b|  ×  old sd

Note that if the scale factor, b, is negative, we must change its sign since the standard deviation must be positive.