Interpreting a graphical summary of a sample
There is sample-to-sample variability in summary displays of samples from a population. However in any practical situation we only have a single data set (sample), so how can we use this knowledge of sample-to-sample variability?
We can assess features such as outliers, clusters or skewness in a data set by examining how often they appear in random samples from a population without such features. In particular, we can examine variability in samples from a normal distribution that closely matches the shape of the data set.
Silkworm poisoning
An earlier section gave a table with the survival time of 80 silkworm larvae that had been exposed to sodium arsenate.
The top half of this diagram shows a box plot and jittered dot plot of the survival data. There is a slight indication of skewness (a long tail to the distribution on the right). Does this indicate that survival has a slightly skew distribution or could it be simply a result of sampling from a symmetric population?
We examine the variability of similar displays from a normal distribution with the same mean (272.6) and standard deviation (30.67) as the data. The bottom half of the display shows one such random sample from this normal population. Click Take sample a few times to observe the sample-to-sample variability of the sample displays.
Observe that there is often as much of an impression of skewness as that shown by the box plot of the actual data. Since this degree of skewness is common if the population is a symmetric normal one, we can conclude that there is no evidence that survival times have a skew distribution.