Warning about over-interpreting histograms of small data sets

Adjusting the class width and the starting position for the first class can give a surprising amount of variability in histogram shape for small data sets. As a result, you must be extremely wary of over-interpreting features such as clusters or skewness in such histograms.

Indeed, it is probably better to avoid using histograms to display small data sets — stacked dot plots are far less likely to mislead you over minor features.

Rain days

The histogram below shows the number of rainy days in a village for each of 20 years.

Use the buttons under the histogram to adjust the class width and to shift the histogram classes to the left or right. Note that the appearance of splitting into clusters is only apparent in some of the histograms, but not in others.

Are the clusters real, or are they just an artifact of our choice of classes?

Without further supporting evidence, the clusters are not pronounced enough for us to conclude that the years must form into two meaningful groups. However they do give an indication of clustering that a good 'data detective' would investigate further.

Because the shape of a small data set's histogram is so dependent of the choice of classes,...

Dot plots should be used in preference to histograms for small data sets.

Dot plots show the size of the data set more clearly and hence give some warning about the risk of over-interpretation.