Estimating other parameters of a normal population

In normal populations, the mean, µ, is the parameter that is most often estimated. However numerical distributions have other parameters that may be of interest. For example,

These parameters can be estimated using the corresponding summary statistic from a random sample, but the error distribution may be difficult to obtain theoretically.

Simulation

Since the parameters µ and σ are unknown, we cannot perform a simulation with repeated samples from the actual population. However we can conduct a simulation from our best estimate of the population distribution — i.e. replacing µ and σ with the sample mean and standard deviation.

We repeatedly take samples of size n from this approximate population and evaluate the estimation error from using each sample. (Since we know the population from which we are sampling, we know the population parameter and can find the estimation error.)

The standard deviation of these errors is the approximate standard error of the estimator.

Annual rainfall in Samaru, Nigeria

In most of Africa, the most important climatic variable is rainfall. Rainfall is usually highly seasonal and failure of crops is normally associated with late arrival of rain or low rainfall. A better understanding of the distribution of rainfall can affect the crops that are grown and when they are planted.

What is the annual rainfall in Samaru, Northern Nigeria, that is not reached in 1 year out of 4?

In other words, we want to estimate the lower quartile of the annual rainfall.

The above dot plot shows the annual rainfall in Samaru, Northern Nigeria between 1928 and 1983 and their upper quartile. Assuming that there is no climate change (or that climate change is negligible compared to the year-to-year variation in rainfall), ...

... we estimate that the population lower quartile is 939 mm — i.e. we predict that annual rainfall will be less than 939 mm in 1 out of 4 years in the future.

Approximate population

The Samaru rainfalls have a fairly symmetric distribution so it is reasonable to try simulating random samples from a normal population. The diagram below shows a normal distribution whose mean and standard deviation equal those of our actual data.

We will use this normal distribution as a population from which to sample 56 values — simulated rainfalls from 56 years. For the normal (1068.1, 178.5) distribution, the lower quartile is 947.8 mm, so this is the 'target' parameter that is estimated by our simulated samples.

The diagram initially shows a sample of 56 rainfalls. The error is the difference between the sample lower quartile and the underlying population parameter, 947.8 mm.

Click Take sample a few times and observe that the error varies from sample to sample.

Click Accumulate then take several samples (of 56 annual rainfalls). The error distribution is built up as a stacked dot plot at the bottom of the diagram.

The distribution of errors gives an idea of how far the lower quartile of our actual rainfall data (939 mm) is likely to be from the true unknown population lower quartile.

Click Estimate s.e. and bias to see the standard deviation and mean of the error distribution.

Understanding accuracy of estimate

In the simulation, you should have observed that the bias of the estimator is small, so we will treat it as zero.

The estimated standard error in your simulation was probably just over 4 sec. We can use the 70-95-100 rule-of-thumb to help interpret its value — the error has approximately 95% chance of being within 2 s.e. of zero and will be almost certainly within 3 s.e. of zero.

The error in our estimate of the population upper quartile, 292 sec, is likely to be less than 8 sec and will almost certainly be less than 12 sec.