Bootstrap simulation

In the previous page, we used random samples from a normal population to simulate the error distribution.

If a normal distribution does not seem a reasonable model, an alternative is to treat the actual data as the 'population' for the simulation and take random samples with replacement from this population. Such samples are called bootstrap samples.

A simulation with these bootstrap samples can again show the error distribution and provide approximate values for the bias and standard error.

November and December rainfall in Dodoma, Tanzania

Monthly rainfall in Dodoma, Tanzania has a skew distribution in some months. The diagram below shows the total rainfall at the start of the rainy season (November and December) in Dodoma between 1935 and 2012.

The upper quartile of the rainfall distribution is 190.6 mm, so this is our estimate of the underlying population upper quartile.

If the rainfall distribution remains the same as in the past, we would estimate rainfall of 190.6 mm or more in one out of every four November/December periods.

We will use a bootstrap simulation to find the error distribution for this estimate.

The actual data are represented by grey crosses in the stacked dot plot and a bootstrap sample is shown in blue. Since the bootstrap sample is taken with replacement, some data values occur in the bootstrap sample more than once — shown by blue digits instead of blue crosses.

As in simulations from a normal model, click Accumulate and take several bootstrap samples to build up the distribution of the errors in the simulation.

Click Estimate s.e. and bias to display the mean and standard deviation of the error distribution.

From either the complete error distribution or by applying the 70-95-100 rule to the standard error, our estimate of the upper quartile rainfall in November/December, 190.6 mm, is unlikely to be in error by more than about 50 mm.


If a normal distribution is a reasonable model for the data, then it is usually better to use it for simulations. However if normality is not a reasonable assumption, then a bootstrap simulation is an alternative that can be used to get a handle on the likely size of the errors.