Data set

In earlier sections, we summarised aspects of the distribution of values in a data set using measures of centre (e.g. the mean and median) and spread (standard deviation and interquartile range). In this section, we introduce a different kind of statistic that describes other aspects of the distribution.

We mainly use one data set for illustration.

Annual rainfall in Dodoma, Tanzania

In most of Africa, the most important climatic variable is rainfall. Rainfall is usually highly seasonal and failure of crops is normally associated with late arrival of rain or low rainfall. A better understanding of the distribution of rainfall can affect the crops that are grown and when they are planted.

The table below shows the annual rainfall in Dodoma, Central Tanzania between 1936 and 2013. (The rainy season actually between November and April, so the last three months of each year are included with the following year.)

Total annual rainfall (mm)
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
696.7
656.5
642.0
476.7
597.3
675.3
783.8
438.7
543.4
745.6
460.8
935.4
481.8
450.9
565.4
576.5
590.6
261.1
379.4
505.7
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
665.0
568.5
722.7
596.5
812.4
352.5
757.8
427.0
711.9
482.8
438.3
398.0
880.7
286.1
520.0
639.1
549.1
654.9
514.3
390.0
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
418.7
431.4
521.2
649.6
597.8
423.0
386.4
606.2
713.4
778.2
439.8
809.3
510.6
667.7
774.6
444.9
511.7
566.4
505.1
531.4
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
826.9
301.4
934.6
496.7
520.9
826.2
534.8
425.0
657.7
517.0
460.6
909.7
539.8
543.1
643.9
446.1
755.5
477.0

The total rainfall varies considerably with a minimum of 261.1 mm in the 1953 rainy season and a maximum of 935.4 mm in 1947. It is an interesting research question to ask whether there is a decreasing trend in rainfall over these 78 years, but the year-to-year variation is much higher than any such trend, so we will ignore the ordering of the data and simply examine their distribution.

The diagram below shows the annual rainfall data as both a stacked and a jittered dot plot.

Click on crosses to see the exact rainfall and year.