Interpreting a graphical summary of a sample

There is sample-to-sample variability in summary displays of samples from a population. However in any practical situation we only have a single data set (sample), so how can we use this knowledge of sample-to-sample variability?

We can assess features such as outliers, clusters or skewness in a data set by examining how often they appear in random samples from a population without such features. In particular, we can examine variability in samples from a normal distribution that closely matches the shape of the data set.

Holiday home rental

The diagram below describes monthly rentals for 41 houses in a holiday resort.

The top half of the diagram shows a box plot and jittered dot plot of the rental data. There is an indication of skewness (a long tail to the distribution on the right). Does this indicate that rentals in the resort have a skew distribution or could it be simply a result of sampling from a symmetric population?

We examine the variability of similar displays from a symmetric normal distribution with similar centre and spread to the data. (The distribution's mean (2378) and standard deviation (1053) equal those of the data.) The bottom half of the display shows one such random sample from this normal population. Click Take sample a few times to observe the sample-to-sample variability of the sample displays.

Observe that there is rarely as much of an impression of skewness as that shown by the box plot of the actual data. Since this degree of skewness is unlikely if the population is a symmetric normal one, we can conclude that there is strong evidence that monthly rentals in the resort have a skew distribution.