To illustrate the use of the Newton-Raphson algorithm to find maximum likelihood estimates, we will examine another standard distribution that is occasionally encountered, the log-series distribution.
Definition
A discrete random variable \(X\) is said to have a log-series distribution if its probability function is
\[ p(x) \;=\; \frac {-1} {\log(1-\theta)} \times \frac {\theta^x} x \quad\quad \text{for } x=1, 2, \dots \]where \(0 \lt \theta \lt 1\).
Its shape to the geometric distribution, but has greater spread with a higher probability at one and a longer tail.
Maximum likelihood
If a random sample \({x_1, x_2, \dots, x_n}\) is collected from this distribution, what is the maximum likelihood estimate of \(\theta\)? The logarithm of the probability function is
\[ \log \left(p(x)\right) \;=\; x \log(\theta) - \log \left(- \log(1 - \theta) \right) - \log(x) \]so the likelihood function is
\[ \ell(\theta) \;=\; \sum_{i=1}^n \log \left(p(x_i)\right) \;=\; {\sum x_i} \log(\theta) - n \times \log \left( -\log(1 - \theta) \right) + K \]where \(K\) is a constant whose value does not depend on \(\theta\). The MLE is the solution of
\[ \ell'(\theta) \;=\; \frac {\sum x_i} {\theta} + \frac n {(1 - \theta)\log(1 - \theta)} \;=\; 0 \]Unfortunately this equation cannot be rearranged to obtain an explicit formula for \(\theta\), so a numerical method must be used to find the maximum likelihood estimate. The Newton Raphson algorithm also requires the second derivative of the log-likelihood,
\[ \ell''(\theta) \;=\; -\frac {\sum x_i} {\theta^2} + \frac {n \left(1 + \log(1 - \theta) \right)} {(1 - \theta)^2\log^2(1 - \theta)} \]The algorithm uses these derivatives iteratively to refine an initial estimate, \(\theta_0\),
\[ \theta_{i+1} \;\; = \;\; \theta_i - \frac {\ell'(\theta_i)} { \ell''(\theta_i)}\]Numerical illustration
Consider the following data set that is assumed to arise from a log-series distribution.
3 | 5 | 1 | 4 | 8 | 10 | 2 | 1 | 1 | 2 |
1 | 8 | 1 | 6 | 13 | 1 | 6 | 2 | 1 | 3 |
1 | 1 | 1 | 2 | 1 | 6 | 1 | 1 | 1 | 1 |
The derivatives of the log-likelihood involve the values \(n = 30\) and \(\sum x = 95\). Iterations of the Newton-Raphson algorithm from an initial guess at the value of \(\theta = 0.7\) are:
Iteration, i | \(\theta_i\) | \(\ell'(\theta_i)\) | \(\ell''(\theta_i)\) |
---|---|---|---|
0 | 0.7000 | 52.656 | -240.78 |
1 | 0.9187 | -43.613 | -1200.14 |
2 | 0.8823 | -11.484 | -661.52 |
3 | 0.8650 | -1.139 | -538.41 |
4 | 0.8629 | -0.013 | -526.41 |
5 | 0.8628 | -0.000 | -526.28 |
6 | 0.8628 | -0.000 | -526.28 |
7 | 0.8628 |
It converges quickly to \(\hat {\theta} = 0.8628\).
The diagram below illustrates the first iteration of the algorithm; it approximates the shape of the log-likelihood using a quadratic curve with the same value, slope and curvature as the log-likelihood at \(\theta = 0.7\). The next iteration is the value of \(\theta\) that maximises this quadratic.
When you get closer to the MLE, the quadratic's shape becomes closer to the actual log-likelihood, so the iterations approach the MLE more quickly. The diagram below illustrates this from a starting value of 0.88.