Practical problems
There are two practical problems with the approximate variance formula for \(\hat {\theta} \),
\[ \Var(\hat {\theta})\; \approx \;- \dfrac 1 {n \times E\left[\frac {\large d^2\; \log\left(p(X \;|\; \theta)\right)} {\large d\;\theta^2} \right]} \]Avoiding the expected value
A numerical value for the approximate variance of \(\hat {\theta}\) can be found with a further approximation,
\[ \Var(\hat {\theta})\; \approx \;- \frac 1 {\ell''(\hat {\theta})} \]Instead of finding the expected value of \(\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\) theoretically, we can approximate it with the sample mean of its values, giving
\[ n \times E\left[\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\right] \;\;\approx\;\; \sum_{i=1}^n \frac {d^2 \log\left(p(x_i \;|\; \theta)\right)} {d\theta^2} \;\;=\;\; \dfrac {d^2} {d \theta^2} \sum_{i=1}^n {\log\left(p(x_i \;|\; \theta)\right)} \]The sum of the log-probabilities on the right is the log-likelihood function, so
\[ n \times E\left[\dfrac {d^2 \log\left(p(X \;|\; \theta)\right)} {d\;\theta^2}\right] \;\;=\;\; \frac {d^2\;\ell(\theta)} {d\;\theta^2} \]giving us the approximation,
\[ \Var(\hat {\theta}) \;\;\approx\;\; - \frac 1 {\ell''(\theta)} \]We can avoid the unknown value of \(\theta\) with a further approximation, this time replacing \(\theta\) with \(\hat{\theta}\), giving
\[ \Var(\hat {\theta}) \;\;\approx\;\; - \frac 1 {\ell''(\hat {\theta})} \]Its square root provides us with a numerical value for the standard error of the maximum likelihood estimator,
\[ \se(\hat {\theta}) \;\;\approx\;\; \sqrt {- \frac 1 {\ell''(\hat {\theta})}} \]This formula lets us find an approximate numerical value for the standard error of almost any maximum likelihood estimator — even when based on models in which the data are not a simple random sample.
We will use this formula for the standard error of maximum likelihood estimators in the rest of the e-book.