Data set
In earlier sections, we summarised aspects of the distribution of values in a data set using measures of centre (e.g. the mean and median) and spread (standard deviation and interquartile range). In this section, we introduce a different kind of statistic that describes other aspects of the distribution.
We mainly use one data set for illustration.
Annual rainfall in Dodoma, Tanzania
In most of Africa, the most important climatic variable is rainfall. Rainfall is usually highly seasonal and failure of crops is normally associated with late arrival of rain or low rainfall. A better understanding of the distribution of rainfall can affect the crops that are grown and when they are planted.
The table below shows the annual rainfall in Dodoma, Central Tanzania between 1936 and 2013. (The rainy season actually between November and April, so the last three months of each year are included with the following year.)
|
|
|
|
The total rainfall varies considerably with a minimum of 261.1 mm in the 1953 rainy season and a maximum of 935.4 mm in 1947. It is an interesting research question to ask whether there is a decreasing trend in rainfall over these 78 years, but the year-to-year variation is much higher than any such trend, so we will ignore the ordering of the data and simply examine their distribution.
The diagram below shows the annual rainfall data as both a stacked and a jittered dot plot.
Click on crosses to see the exact rainfall and year.