Warning about over-interpreting histograms of small data sets
Adjusting the class width and the starting position for the first class can give a surprising amount of variability in histogram shape for small data sets. As a result, you must be extremely wary of over-interpreting features such as clusters or skewness in such histograms.
Indeed, it is probably better to avoid using histograms to display small data sets — stacked dot plots are far less likely to mislead you over minor features.
Steel Works Slag data
The histogram below shows the percentage FeO in slag from batches of iron ore processed by a steel works.
Use the buttons under the histogram to adjust the class width and to shift the histogram classes to the left or right. Note that the appearance of splitting into clusters is only apparent in some of the histograms, but not in others.
Are the clusters real, or are they just an artifact of our choice of classes?
Without further supporting evidence, the clusters are not pronounced enough for us to conclude that the batches of iron ore must form into two meaningful groups. However they do give an indication of clustering that a good 'data detective' would investigate further.
In this data set, further investigation showed that the clusters did correspond largely to iron ore from two different sources.
Because the shape of a small data set's histogram is so dependent of the choice of classes,...
Dot plots should be used in preference to histograms for small data sets.
Dot plots show the size of the data set more clearly and hence give some warning about the risk of over-interpretation.