Displays show the distribution of values in the data
Even when a data set has no outliers or clusters, graphical displays such as dot plots, stem and leaf plots or histograms show clearly the distribution of values in the data — what kind of values are most common in the data and what values are less common. Three important features of the distribution are:
We will examine the concepts of centre and spread in more detail later.
Wind speed
As part of a study of pollution levels in the UK, wind speed (mph) was measured on 114 successive days in 1990. (Levels of pullutants in the atmosphere were also recorded, but will not be examined here.)
No outliers or clusters are evident in the data.
However the display shows clearly the day-to-day variability in wind speed. At the same location in the following year, we would expect about half the days to have a wind speed below 9 mph, and perhaps one day in five to have wind speed of 15 mph or higher.
Symmetry and skewness
If the density tails off in a similar way at both ends of the distribution, we call the distribution symmetric. If one side of the distribution tails off more slowly, we say that the distribution is skew.
Wind speed
The wind speed data have a reasonably symmetric distribution, though there is perhaps a slightly longer tail to the right.
Storm duration
In a detailed hydrological study in Malawi, rainfall data were collected from a continuously recording rain gauge installed in Bvumbwe catchment. The durations (minutes) of the first 50 storms in the 1983/4 rainy season are shown in the stacked dot plot below.
A 'typical' storm duration is say 100 minutes. Although none of the storms lasted less than 100 minutes less than this (negative durations are impossible!), nine of them had duration more than 100 minutes greater than this 'typical' value and several were very long. This distribution is therefore skew with a long tail towards the higher durations.