Clusters
If a dot plot, stem and leaf plot or histogram separates into two or more groups of values (clusters), this suggests that the 'individuals' from which the data were recorded may similarly be split into two or more groups. Further investigation might reveal that the clusters correspond to ...
Detecting the cause of differences between the groups may lead to valuable insights into the data. For example, if the data are yields of corn, one variety may give a higher yield than the other. Growing only this variety would improve yields.
Eruptions of Old Faithful geyser
The Old Faithful is a geyser in the Yellowstone National Park in the USA that is known for its regular eruptions. Volunteers collected information about all eruptions in October 1980 (except for those from midnight to 6 am). The dot plot below shows the durations of these eruptions.
The eruption durations form two distinct clusters, so there seem to be two different types of eruption. What other characteristics of the eruptions are different between the two types?
The next dot plot shows the distribution of the intervals between successive eruptions. Again, there are two clusters, though not quite as distinct.
Are the same eruptions in the same clusters for both variables? Are successive eruptions in the same or different clusters? (More advanced statistical methods are needed to answer these questions.)
Discovery of clusters is important information that should lead to further research.
Yam growth
The stem and leaf plot on the right describes weekly growth in 20 yam plants. There is considerable variation in the growth, ranging from about 5 cm to 12 cm.
There appears to be a low-density gap in the distribution between 7 and 9 cm, suggesting that the plants may be split into two separate clusters.
Although this is only a small data set and the clusters are not well separated, they should be further investigated.
The data collector should further examine the samples for other systematic differences between the clusters — perhaps there are two different varieties of yam, or there might be differences in soil characteristics of the two groups of plants?
Information about clustering is often of great importance to the data analyst.
If the two clusters were found to correspond to different yam varieties, it would be misleading to examine all the data together — we should separately display (and contrast) data from the two varieties.