Practical problems

There are two practical problems with the approximate variance formula for \(\hat {\theta} \),

\[ \Var(\hat {\theta})\; \approx \;- \dfrac 1 {n \times E\left[\frac {\large d^2\; \log\left(p(X \;|\; \theta)\right)} {\large d\;\theta^2} \right]} \]
Difficulty evaluating the expected value
For many distributions, it is impossible to find a simple formula for the expected value.
Unknown value of \(\theta\)
Even if this expected value can be found, it is usually a function of \(\theta\), but \(\theta\) is an unknown value.

Avoiding the expected value

A numerical value for the approximate variance of \(\hat {\theta}\) can be found with a further approximation,

\[ \Var(\hat {\theta})\; \approx \;- \frac 1 {\ell''(\hat {\theta})} \]

Instead of finding the expected value of \(\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\) theoretically, we can approximate it with the sample mean of its values, giving

\[ n \times E\left[\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\right] \;\;\approx\;\; \sum_{i=1}^n \frac {d^2 \log\left(p(x_i \;|\; \theta)\right)} {d\theta^2} \;\;=\;\; \dfrac {d^2} {d \theta^2} \sum_{i=1}^n {\log\left(p(x_i \;|\; \theta)\right)} \]

The sum of the log-probabilities on the right is the log-likelihood function, so

\[ n \times E\left[\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\right] \;\;=\;\; \frac {d^2\;\ell(\theta)} {d\;\theta^2} \]

giving us the approximation,

\[ \Var(\hat {\theta}) \;\;\approx\;\; - \frac 1 {\ell''(\theta)} \]

We can avoid the unknown value of \(\theta\) with a further approximation, this time replacing \(\theta\) with \(\hat{\theta}\), giving

\[ \Var(\hat {\theta}) \;\;\approx\;\; - \frac 1 {\ell''(\hat {\theta})} \]

Its square root provides us with a numerical value for the standard error of the maximum likelihood estimator,

\[ \se(\hat {\theta}) \;\;\approx\;\; \sqrt {- \frac 1 {\ell''(\hat {\theta})}} \]

This formula lets us find an approximate numerical value for the standard error of almost any maximum likelihood estimator — even when based on models in which the data are not a simple random sample.

We will use this formula for the standard error of maximum likelihood estimators in the rest of the e-book.