Need for numerical method
The maximum likelihood estimate of a parameter \(\theta\) is usually a value that satisfies the equation
\[ \ell'(\theta) \;\; = \;\; 0 \]where \(\ell(\theta)\) is the log-likelihood function. When the data being used to estimate the parameter are from many common distributions, this equation has a fairly simple form and can be solved with some algebraic manipulation.
However for other kinds of model, the equation cannot be explicitly solved, so an iterative numerical method is required to obtain the maximum likelihood estimate.
Newton Raphson algorithm
One way to solve an equation numerically is called the Newton Raphson algorithm. Consider an equation
\[ g(\theta) \;\; = \;\; 0 \]A Taylor series expansion of the function around \(\theta_0\) is
\[ g(\theta) \;\; = \;\; g(\theta_0) + (\theta - \theta_0) g'(\theta_0) + \frac {(\theta - \theta_0)^2} {2!} g''(\theta_0) + \dots\]Ignoring the higher-order terms (since they will be relatively small if \(\theta\) is close to \(\theta_0\)),
\[ g(\theta) \;\; \approx \;\; g(\theta_0) + (\theta - \theta_0) g'(\theta_0)\]If we are looking for the value \(\theta\) such that \(g(\theta) = 0\), this can be simplified to give
\[ \theta \;\; \approx \;\; \theta_0 - \frac {g(\theta_0)} { g'(\theta_0)}\]Although the value on the right is usually not equal to the value that solves the equation, it is often closer than \(\theta_0\), leading to the following algorithm.
Newton Raphson algorithm
Starting at an initial guess of the solution, \(\theta_0\), successive values
\[ \theta_{i+1} \;\; = \;\; \theta_i - \frac {g(\theta_i)} { g'(\theta_i)} \qquad \text{for } i=0,\dots\]are called the Newton Raphson algorithm. If it converges, it is to a solution of the equation \(g(\theta) = 0\).
Applying the algorithm to maximum likelihood
To apply it to maximum likelihood, we use the function \(g(\theta) = \ell'(\theta)\). The Newton Raphson algorithm can therefore be expressed as
\[ \theta_{i+1} \;\; = \;\; \theta_i - \frac {\ell'(\theta_i)} { \ell''(\theta_i)}\]This usually converges to the maximum likelihood estimate, provided the initial guess, \(\theta_0\) is not too far from the correct value. The algorithm may need to be used from various starting values until one is found for which the algorithm converges.