Clusters
Sometimes the cloud of crosses separates into two or more groups which are called clusters. As with outliers, clusters provide important information that should be further investigated.
Again, the individuals should be further examined (perhaps collecting further information from them) to try to assess whether the clusters correspond to individuals with distinct characteristics. For example, the clusters may correspond to males and females, or to two different species of plant.
Iris data
The scatterplot below shows the width and length of the sepals from iris flowers that were sampled in the Gaspé Peninsula. (Although this example is not a business one, it does illustrate clusters well.) Initially leave the checkbox Show iris varieties under the diagram unchecked.
The crosses separate into two groups. (You may find it easier to distinguish the groups by either moving further from the monitor or squinting.)
The two groups of irises have sepals of different shapes. The sepals in the bottom right group tend to be narrower for any sepal length than those in the top left group — they are relatively long and narrow.
Even without any other evidence, these data suggest that there may be two distinct species of iris. In actual fact, the botanists who collected the data had already classified the irises into three varieties. Click the checkbox Show iris varieties to display the different varieties with different symbols on the scatterplot.