To illustrate the use of the Newton-Raphson algorithm to find maximum likelihood estimates, we will examine another standard distribution that is occasionally encountered, the log-series distribution.

Definition

A discrete random variable \(X\) is said to have a log-series distribution if its probability function is

\[ p(x) \;=\; \frac {-1} {\log(1-\theta)} \times \frac {\theta^x} x \quad\quad \text{for } x=1, 2, \dots \]

where \(0 \lt \theta \lt 1\).

This distribution is similar in shape to the geometric distribution, but the log-series distribution has greater spread with a higher probability at one and longer tail. The distribution's mean and variance can be shown to be

\[ \begin{align} E[X] \;\;&=\;\; \frac {-1} {\log(1-\theta)} \times \frac {\theta} {1 - \theta} \quad\quad \\[0.2em] \Var(X) \;\;&=\;\; -\theta \frac {\theta + \log(1 - \theta)} {(1 - \theta)^2 \log^2(1 - \theta)} \end{align} \]

Maximum likelihood

If a random sample \({x_1, x_2, \dots, x_n}\) is collected from this distribution, what is the maximum likelihood estimate of \(\theta\)? The logarithm of the probability function is

\[ \log \left(p(x)\right) \;=\; x \log(\theta) - \log \left( -\log(1 - \theta) \right) - \log(x) \]

so the likelihood function is

\[ \ell(\theta) \;=\; \sum_{i=1}^n \log \left(p(x_i)\right) \;=\; {\sum x_i} \log(\theta) - n \times \log \left( -\log(1 - \theta) \right) + K \]

where \(K\) is a constant whose value does not depend on \(\theta\). The maximum likelihood estimate is the solution of

\[ \ell'(\theta) \;=\; \frac {\sum x_i} {\theta} + \frac n {(1 - \theta)\log(1 - \theta)} \;=\; 0 \]

Unfortunately this equation cannot be rearranged to obtain an explicit formula for \(\theta\), so a numerical method must be used to find the maximum likelihood estimate. The Newton Raphson algorithm also requires the second derivative of the log-likelihood,

\[ \ell''(\theta) \;=\; -\frac {\sum x_i} {\theta^2} + \frac {n \left(1 + \log(1 - \theta) \right)} {(1 - \theta)^2\log^2(1 - \theta)} \]

The algorithm uses these derivatives iteratively to refine an initial estimate, \(\theta_0\),

\[ \theta_{i+1} \;\; = \;\; \theta_i - \frac {\ell'(\theta_i)} { \ell''(\theta_i)}\]

The method will be illustrated with a numerical example.

Example

Consider the following data set that is assumed to arise from a log-series distribution.

3 5 1 4 8 10 2 1 1 2
1 8 1 6 13 1 6 2 1 3
1 1 1 2 1 6 1 1 1 1

The derivatives of the log-likelihood involve the values \(n = 30\) and \(\sum x = 95\). Iterations of the algorithm from an initial guess at the value of \(\theta = 0.7\) are shown below.

Try different values for the initial guess at \(\theta\) (by typing other values into the text edit box and clicking Show iterations) to investigate the convergence of the algorithm. For most initial guesses, the algorithm converges fairly quickly to the maximum likelihood estimate, \(\hat{\theta} = 0.8628\).


The following diagram illustrates the Newton Raphson iterations graphically. The algorithm is equivalent to approximating the shape of the log-likelihood with a quadratic curve that has the same value, slope and curvature (1st and 2nd derivatives) as the log-likelihood at the current value of \(\theta\). The next iteration is the value of \(\theta\) that maximises this quadratic.

The bottom half of the diagram shows the shape of the log-series distribution and the data set. The slider changes the value of the unknown parameter, \(\theta\).

The top half of the diagram displays the log-likelihood function. The green curve shows the quadratic curve with the same slope and curvature as those of the log-likelihood at the currently selected value of \(\theta\). The Newton Raphson algorithm effectively steps to the maximum of this approximating quadratic for its next iteration.

The next diagram is identical but zooms in to parameter values much closer to the maximum likelihood estimate.

Observe that the log-likelihood is more closely approximated by a quadratic, so the quadratic's maximum is close to the maximum likelihood estimate — once the algorithm gets close to the maximum likelihood estimate, it converges quickly.