Maximum likelihood estimators have very good large-sample properties.

These strictly require certain "regularity conditions" to be satisfied. We will avoid them in this e-book — they almost always hold.

The following results apply to the maximum likelihood estimator, \(\hat {\theta}\) of a parameter \(\theta\), based on a random sample of size \(n\) when \(n \to \infty\) — i.e. its asymptotic properties.

Bias

It is asymptotically unbiased,

\[ E[\hat {\theta}] \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \theta \]

Variance and consistency

It asymptotically has variance,

\[ \Var(\hat {\theta}) \;\; \xrightarrow[n \rightarrow \infty]{} \;\; - \frac 1 {n \times E\left[\large\frac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\theta^2} \right]} \]

Since this tends to zero as \(n \rightarrow \infty\) and the bias is asymptotically zero, a maximum likelihood estimator is also consistent.

Asymptotic normality

It asymptotically has a normal distribution,

\[ \hat {\theta} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \text{a normal distribution} \]

We now express these three properties together in a slightly more formal way.

All together

If \(\hat {\theta} \) is the maximum likelihood estimator of a parameter, \(\theta\), based on a random sample of size \(n\),

\[ (\hat {\theta} - \theta) \times \sqrt {-n \times E\left[\frac {d^2\; \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2} \right]} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \NormalDistn(0, 1) \]

A final result states that a maximum likelihood estimator cannot be beaten in large samples.

Asymptotically "best"

Other estimators of a parameter, \(\theta\), may have lower mean squared errors in small samples, but none have lower mean squared error than the maximum likelihood estimator if the sample size is large enough.