Stability of the shape of box plots
We saw earlier that features in dot plots, stem and leaf plots and histograms are relatively unstable when used with small data sets. There is high sample-to-sample variability if different data are collected from the same process. Care must therefore be taken not to over-interpret their shape.
The same happens with box plots, but to a lesser extent. Box plots summarise the data further and are therefore more stable descriptions of the distribution of values than those that we described earlier.
As with other displays, the larger the data set, the more stable the box plots become.
Lengths of kidney beans
The box plot below describes the lengths (mm) of 20 kidney beans.
Click the button Another sample several times to see the box plots that might arise from different samples of 20 beans of the same variety. Observe that there is considerable variability in the box plots, especially in the extremes, but there are fewer distracting artifacts such as clusters than in the corresponding dot plots.
Use the pop-up menu to change the sample size from 20 to 50, then repeat the sampling a few times. The box plots become less variable.
Finally, repeat with a sample size of 150. The box plot now gives a fairly consistent display, showing clearly that the middle half of the data (between the upper and lower quartiles) is approximately between the values 8.37 and 8.43.