Different definitions of the quartiles

There is universal agreement that the median of a data set is the middle value if there is an odd number of observations, or half-way between the middle two values if the size of the data set is even.

However it was mentioned earlier that there are several competing definitions of the upper and lower quartile. All such definitions split the data approximately into quarters but there is not a unique way to do this. For example, if there are n = 16 values in the data set, any value between the 4th and 5th values would cut off a quarter of the data. In this situation, we have defined here the lower quartile to be half-way between these values, but other authors and computer software define the lower quartile to be nearer to the 4th value.

The differences are of little practical importance.

If your conclusion about the data would change with a different definition of the quartiles, you are over-interpreting the data.

Smoothing the cumulative proportions

There is even less agreement about the precise definition of other percentiles, and different computer software finds them in different ways. In the earlier pages of this section, we defined the percentiles as the values that are found from reading across and down the cumulative distribution function.

Most statistical computer software replaces the cumulative distribution function (a step function) with a smoothed version before reading off the percentiles.

In practical terms, the difference is unimportant. If the data set is large, there is likely to be little difference in the value of most percentiles. If the data set is small, the percentile is likely to be more affected by 'randomness' of the data so the precise value is less important.

If your conclusion about the data would change with a different definition of the percentiles, you are over-interpreting the data.


Annual rainfall in Dodoma, 1998 to 2013

The diagram below shows the last 16 years of the Dodoma annual rainfall data.

Drag the red horizontal line to read off different quartiles. Observe that the percentiles do not change smoothly, due to the steps in the cumulative distribution function.

Click the checkbox Smoothed to replace the cumulative distribution function with a smoothed version. Again drag the horizontal line to read the percentiles from this graph and observe that the percentiles change without sharp jumps.

(This smoothed graph gives the definition of the percentiles that is used in many statistical computer programs.)

Annual rainfall in Dodoma, 1936 to 2013

The diagram below shows the full 78 years of Dodoma annual rainfall data.

Again drag the red horizontal line to read off different quartiles for the actual cumulative distribution function and the smoothed version. Observe that the differences between the two definitions of the percentiles are much smaller with this larger data set.