Simple summaries of spread
The simplest summary statistics that describe the variability in a data set are based on the quartiles and extremes of the distribution.
- Range
- The range is the difference between the maximum and minimum values. All the data are within an interval of this width.
- Interquartile range
- The interquartile range is the difference between the upper quartile and lower quartile. Half of the data lie between the two quartiles, so an interval of this width includes half the data.
The range of a data set only depends on the minimum and maximum values and is therefore a fairly poor summary of spread. In a large set of marks, it is not uncommon for one student to obtain full marks and another to get zero, so the range does not describe the spread in marks for more typical students.
The interquartile range is therefore a better summary of the spread of marks.
The diagram below shows marks in a test that was attempted by three classes
It is evident from the jittered dot plots that:
- Room 1 tends to have higher marks than rooms 2 or 3.
- Room 3 is more variable than the other two classes.
The table of medians and ranges concisely summarises these differences between the classes. |
(The medians are also displayed as blue lines on the dot plots and the ranges are represented by the widths of the gray bands behind the dot plots.)
Click the button Sample a few times to give the three classes different tests. In most (but not all) of these different data sets, you will observe the same differences between the classes.
Interpreting the median and interquartile range
Although a single measure of centre and one of spread provide only limited information about the shape of a distribution of values, it is possible to sketch a bell-shaped histogram that matches the values. Such a 'guess' is often close to the actual distribution of values.
The two values do not provide any information about skewness of the distribution or other features of its shape, so such a 'guess' may not be accurate.
In Excel
There are no built-in functions to evaluate the range or interquartile range in Excel, but they can be easily found from the minimum, maximum and quartiles of the distribution. If the marks are contained in the cells A1 to A25 of a spreadsheet, the formula "=QUARTILE(A1:A25, 3)-QUARTILE(A1:A25, 1)" will calculate the interquartile range and "=MAX(A1:A25)-MIN(A1:A25)" will find the range.