Asymptotic properties of maximum likelihood estimators
An important reason for the importance of maximum likelihood estimators is that they have very good large-sample properties.
We now describe some properties of the maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), that is based on a random sample of size \(n\) from a distribution with probability function \(p(x \mid \theta)\).
The properties strictly require certain "regularity conditions" to be satisfied. We will omit the conditions since they hold for the kinds of model that we will consider in this e-book.
Proofs of the properties are complex and will also be omitted.
Bias
A maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), is asymptotically unbiased,
\[ E[\hat {\theta}] \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \theta \]Variance and consistency
A maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), asymptotically has variance,
\[ \Var(\hat {\theta}) \;\; \xrightarrow[n \rightarrow \infty]{} \;\; - \frac 1 {n \times E\left[\large\frac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\theta^2} \right]} \]Since this tends to zero as \(n \rightarrow \infty\) and the bias is asymptotically zero, a maximum likelihood estimator is also consistent.
Asymptotic normality
A maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), asymptotically has a normal distribution,
\[ \hat {\theta} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \text{a normal distribution} \]We now express these three properties together in a slightly more formal way that is clearer about what is meant by the limits above.
All together
If \(\hat {\theta} \) is the maximum likelihood estimator of a parameter, \(\theta\), based on a random sample of size \(n\),
\[ (\hat {\theta} - \theta) \times \sqrt {-n \times E\left[\frac {d^2\; \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2} \right]} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \NormalDistn(0, 1) \]A final result states that a maximum likelihood estimator cannot be beaten in large samples.
Asymptotically "best"
Other estimators of a parameter, \(\theta\), may have lower mean squared errors in small samples, but none have lower mean squared error than the maximum likelihood estimator if the sample size is large enough.