In small data sets, features must be very prominent to be called outliers or clusters
We have described some information that may be read from a scatterplot. But how strong must the corrresponding patterns be before we should report them?
In both univariate and bivariate data sets, outliers or clusters must be very distinct before we should conclude that they are real, in the absence of further external information confirming that the individuals are distinct.
Particularly in small data sets, outliers, clusters and other patterns may arise by chance, without being associated with any real features in the individuals.
Be careful not to overinterpret features in scatterplot unless they are well defined, especially if the sample size is small.
Sweet corn size
Consider part of a field containing 400 heads of sweet corn. The diagram below shows the relationship between head length and head diameter for these 400 heads. There is a positive relationship, but no other significant features.
A scientist decides to measure only 20 of the heads of corn from the field. Click the button Take sample to see the scatterplot for a sample of 20 heads of corn. Click the button several more times and observe the variability in the scatterplots. Although there are no outliers or clusters in the field, the scatterplot occasionally gives a false suggestion of an outlier, clusters or even a curved relationship.