Spread of an estimator's distribution

One characteristic of a good estimator of a parameter is that the mean of its distribution should equal (or at least be close to) the parameter being estimated,

However this is not enough to characterise a good estimator since an unbiased estimator can also have a distribution with a wide spread of values.

The estimator on the right is clearly better than the one whose distribution is shown on the left since its values are likely to be closer to the actual parameter value, \(\theta\).

Standard error

The spread of an estimator's distribution also describes how accurately it can estimate the target parameter, and this is usually summarised by the distribution's standard deviation.

Definition

The standard error of an estimator \(\hat{\theta}\) of a parameter \(\theta\) is defined to be its standard deviation.

The standard error is also the standard deviation of the estimation error,

\[ error \;\; = \;\; \hat{\theta} - \theta \]

and this is the reason for its name — it is a 'typical' estimation error.

Example: Active ingredient in medicine

Pharmaceutical companies routinely test their products to ensure that the concentration of active ingredient, \(\mu\), is within tight limits. However the chemical analysis is not precise and repeated measurements of the same specimen differ slightly.

One type of analysis gives estimated concentrations of the active ingredient that are normally distributed with standard deviation \(\sigma = 0.0068\) grams per litre. A product is tested 16 times, giving a sample mean concentration of \(\hat{\mu} = \overline{x} = 0.0724\) grams per litre.

When a product is tested once, the recorded concentration is

\[ X \;\; \sim \; \; \NormalDistn\left(\mu, \;\;\sigma_X^2 = 0.0068^2\right) \]

The mean of a random sample of \(n\) values from this distribution is also normally distributed,

\[ \overline{X} \;\; \sim \; \; \NormalDistn\left(\mu, \;\;\sigma_{\overline{X}}^2 = \frac {0.0068^2} n \right) \]

Since \(E[\overline{X}] = \mu\), the estimator is unbiased. Its standard error is

\[ \se(\overline{X}) = \sqrt {\frac {0.0068^2} n} = 0.0017 \quad\quad \text{when } n = 16\]

Since the estimator is unbiased, the standard error is also the standard deviation of the estimation error,

\[ \overline{X} - \mu \;\; \sim \; \; \NormalDistn\left(0, \;\;\sigma_{\overline{X} - \mu}^2 = \frac {0.0068^2} n\right) \]

The error distribution is shown in the diagram below.

Use the slider to see how the sample size affects the likely size of the errors. When \(n = 16\), the estimation error is unlikely to be more than 0.004. Our estimate of 0.0724 will probably be less than 0.004 from the true concentration for this product.

For many parameter estimates, the formula for the standard error involves unknown parameters, so its exact numerical value cannot be obtained.

Somewhat confusingly, the term "standard error" is also used for an estimate of the standard error, obtained when the parameters in its formula are replaced by estimates.

Example: Heat treatment of mangoes

In an experiment to assess the effectiveness of heat-treatment of mangoes as a method of killing fruit fly eggs and larvae, several infested fruit were heat-treated at 44°C. Out of 572 fruit fly eggs in the mangoes, 30 survived, giving an estimate of the probability of survival at this temperature of:

\[ \hat{\pi} = P = \frac {30} {572} = 0.05245 \]

What are the bias and standard error of the estimator?

If all \(n = 572\) eggs independently have the same probability, \(\pi\) of surviving, the number surviving has a binomial distribution,

\[ X \;\; \sim \; \; \BinomDistn(n = 572, \pi) \]

From the properties of the binomial distribution,

\[ E[P] = \pi \spaced{and} \Var(P) = \frac {\pi(1-\pi)} n \]

The sample proportion is therefore an unbiased estimator of \(\pi\) and its standard error is

\[ \se(P) = \sqrt{\frac {\pi(1-\pi)} {572}} \]

This formula unfortunately involves the unknown parameter, \(\pi\), so we cannot use it directly to obtain a numerical value for the standard error. However we can replace \(\pi\) in the formula with the sample proportion, giving:

\[ \widehat{\se}(P) = \sqrt{\frac {\frac {30} {572}\left(1-\frac {30} {572}\right)} {572}} = 0.0093\]