Maximum likelihood estimators have very good large-sample properties.
These strictly require certain "regularity conditions" to be satisfied. We will avoid them in this e-book — they almost always hold.
The following results apply to the maximum likelihood estimator, \(\hat {\theta}\) of a parameter \(\theta\), based on a random sample of size \(n\) when \(n \to \infty\) — i.e. its asymptotic properties.
Bias
It is asymptotically unbiased,
\[ E[\hat {\theta}] \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \theta \]Variance and consistency
It asymptotically has variance,
\[ \Var(\hat {\theta}) \;\; \xrightarrow[n \rightarrow \infty]{} \;\; - \frac 1 {n \times E\left[\large\frac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\theta^2} \right]} \]Since this tends to zero as \(n \rightarrow \infty\) and the bias is asymptotically zero, a maximum likelihood estimator is also consistent.
Asymptotic normality
It asymptotically has a normal distribution,
\[ \hat {\theta} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \text{a normal distribution} \]We now express these three properties together in a slightly more formal way.
All together
If \(\hat {\theta} \) is the maximum likelihood estimator of a parameter, \(\theta\), based on a random sample of size \(n\),
\[ (\hat {\theta} - \theta) \times \sqrt {-n \times E\left[\frac {d^2\; \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2} \right]} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \NormalDistn(0, 1) \]A final result states that a maximum likelihood estimator cannot be beaten in large samples.
Asymptotically "best"
Other estimators of a parameter, \(\theta\), may have lower mean squared errors in small samples, but none have lower mean squared error than the maximum likelihood estimator if the sample size is large enough.