Simple measures of spread

The range and inter-quartile range are summaries of the spread of values in a data set that are (relatively) easy to understand and to explain to others.

Range
Difference between maximum and minimum values
Inter-quartile range
The middle half of the values are within an interval of this length

However for several reasons, some of which will be explained later in this e-book, neither of these values is commonly used either as a summary of spread in reports or for further data analysis.

Standard deviation

The value that is most often used to summarise the spread of values in a data set is its standard deviation.

The standard deviation is a 'typical' distance of values from the sample mean.

The diagram below illustrates.

Unfortunately the exact formula for the standard deviation is relatively complex:

The standard deviation of a data set is denoted by the letter s and will be widely used in later chapters.

Brief explanation

It is easier to explain the properties of the standard deviation than to justify its precise formula. However note that the term,

in the formula depends on the squared differences between the individual values and the sample mean. The closer the values to the mean (corresponding to a small spread), the smaller this sum and therefore the smaller the standard deviation.

The standard deviation is small when the spread of values is small and large when the spread of values is large, so it works as a summary of spread.

Variance

The square of the standard deviation, s2, is called the sample variance and is sometimes used as an alternative description of spread. However the value of s has the same units as the original data (e.g. kilograms or dollars) so is more easily interpreted than s2, and standard deviations are usually prefered to summarise spread.

The diagram below shows 20 values whose mean is exactly 8.

Click on crosses to see the difference between the values and the mean. The standard deviation is 'typical' of the magnitude of these differences.

Use the slider to adjust the spread of values and observe that the standard deviation is small when the values are all close to their mean and large when they are more variable.

(Warning: The x-values displayed in the table above are rounded to 2 decimals, but the squared differences and the standard deviation are calculated from the exact x-values, so they do not match exactly with what you would calculate by hand.)