General idea of transformations
Sometimes it is convenient to express numbers on a different scale. For example, an American would easily recognise that a human body temperature of 102 degrees Fahrenheit indicates is unusually high, whereas in other countries temperatures are more easily 'understood' on the Celsius scale. This is called transformation.
Some transformations are performed for the convenience of the reader (such as the Fahrenheit to Celsius conversion above), but transformation can also be a useful tool that can help us understand a data set.
Linear transformations
Sometimes the values in a data set can be replaced by others holding exactly the same information. For example, a fisheries researcher might record the weights in grams of 28 trout that were caught in a particular river. If the weights had been recorded in imperial measurements (ounces), the data set would have contained equivalent information.
When the new values are found from the original data by an equation of the form
new value = a + b × old value
it is called a linear transformation of the original values. A linear transformation can change the centre and spread of the data, but its shape otherwise remains unchanged. For graphical displays, only the numbers labelling the axis changes.
Linear transformation changes the centre and spread, but not the shape of a distribution.
Since the shape of the distribution is unaffected, linear transformations do not help you to understand the distribution of values in the data.
Weights of trout
Weights in ounces and grams are related by the equation
grams = 28.3494 × ounces
The dot plot below shows the trout weights that a fisheries researcher recorded. The two axes allow the weights to be read off in grams and ounces — separate dot plots are not necessary.
Drag over the individual crosses to see the weight of the trout in grams and ounces.
Centre and spread
The centre and spread of linearly transformed data can be easily found from those of the original measurements. After a transformation of the form
new value = a + b × old value
the mean (and other measures of centre such as the median) are similarly related
new mean = a + b × old mean
The standard deviation (and other measures of spread that are expressed in the same units as the raw data, such as the inter-quartile range) are related with the equation.
new sd = |b| × old sd
Note that if the scale factor, b, is negative, we must change its sign since the standard deviation must be positive.