Sample mean or median?

We now illustrate the use of bias and standard error to find which of two competing estimators is better.

Samples from a normal distribution

We now consider a random sample of \(n\) values from a normal distribution whose standard deviation is known to be 0.2,

\[ X \;\; \sim \; \; \NormalDistn(\mu, \;\;\sigma^2 = 0.2^2) \]

Since the normal distribution is symmetric, both its mean and median are equal to \(\mu\), so is the sample mean or sample median a better estimator?

We already know the distribution of the sample mean, \(\overline{X}\),

\[ \overline{X} \;\; \sim \; \; \NormalDistn\left(\mu, \;\;\sigma_{\overline{X}}^2 = \frac {0.2^2} n \right) \]

The distribution of the sample median, \(\tilde{X}\), is rather more complex. Its expected value is also \(\mu\), but its distribution is not normal and its standard deviation is hard to find. However in large samples, it is approximately

\[ \tilde{X} \;\; \underset{\text{approx}}{\sim} \; \; \NormalDistn\left(\mu, \;\;\sigma_{\tilde{X}}^2 = \frac {0.2^2} n \times 1.571 \right) \]

Both estimators are therefore unbiased, but the standard error of the mean, \(\overline{X}\), is lower than that of the median, \(\tilde{X}\), so the sample mean is the better estimator.

The following diagram illustrates this for different sample sizes — the sample mean results in estimation errors that are likely to be smaller than the sample median.

Samples from a skew distribution

The diagram below shows the probability density function of a skew distribution with standard deviation 4. If the median of this distribution, \(\gamma\), was unknown, the obvious estimator would be the median of a random sample, \(\tilde{X}\).

However we will investigate whether the sample mean would be a better estimator. \(\overline{X}\) has standard error \(\frac{4} {\sqrt n}\). The distributions of the sample mean and sample median are shown below.

Both estimators have similar standard errors. However the sample mean is a biased estimator of the population median of this skew distribution since the mean of its distribution is considerably larger than the population median, 2.77. When the sample size is large, the sample mean is almost certain to overestimate the distribution's median.

Not also that the sample mean is not a consistent estimator of the distribution's median since its bias, 1.23, does not tend to zero as the sample size increases.