Bootstrap simulation

In the previous page, we used random samples from a normal population to simulate the error distribution.

If a normal distribution does not seem a reasonable model, an alternative is to treat the actual data as the 'population' for the simulation and take random samples with replacement from this population. Such samples are called bootstrap samples.

A simulation with these bootstrap samples can again show the error distribution and provide approximate values for the bias and standard error.

October rainfall in Samaru, Northern Nigeria

Monthly rainfall in Samaru, Nigeria has a skew skew distribution in some months. The diagram below shows October rainfall in Samaru between 1928 and 1983.

The upper quartile of the rainfall distribution is 57.4 mm, so this is our estimate of the underlying population upper quartile.

If the rainfall distribution remains the same as in the past, we would estimate rainfall of 57.4 mm or more in one out of every four Octobers.

We will use a bootstrap simulation to find the error distribution for this estimate.

The actual data are represented by grey crosses in the stacked dot plot and a bootstrap sample is shown in blue. Since the bootstrap sample is taken with replacement, some data values occur in the bootstrap sample more than once — shown by blue digits instead of blue crosses.

As in simulations from a normal model, click Accumulate and take several bootstrap samples to build up the distribution of the errors in the simulation.

Click Estimate s.e. and bias to display the mean and standard deviation of the error distribution.

From either the complete error distribution or by applying the 70-95-100 rule to the standard error, our estimate of the upper quartile rainfall in October, 57.4 mm, is unlikely to be in error by more than about 20 mm.


If a normal distribution is a reasonable model for the data, then it is usually better to use it for simulations. However if normality is not a reasonable assumption, then a bootstrap simulation is an alternative that can be used to get a handle on the likely size of the errors.