Maximum likelihood estimate

The likelihood function, \(L(\theta \; | \; x_1, x_2, \dots, x_n) = p(x_1, x_2, \dots, x_n \;| \; \theta)\), gives the probability of getting the data that were recorded for different values of the unknown parameter \(\theta\). A value of \(\theta\) that gives the observed data high probability is more likely to be correct than one that would make the observed data unlikely.

Definition

The maximum likelihood estimate of a parameter \(\theta\) is the value that maximises the likelihood function,

\[ L(\theta \; | \; x_1, x_2, \dots, x_n) = p(x_1, x_2, \dots, x_n \;| \; \theta) \]

Finding a maximum likelihood estimate (MLE) is therefore the mathematical problem of maximising a function of the parameter \(\theta\). This is usually a "turning point" of the likelihood function and calculus provides a method to find the maximum likelihood estimator.

Finding the maximum likelihood estimate

The maximum likelihood estimate of a parameter \(\theta\) can normally be obtained as a solution to the equation

\[ \frac {d\; L(\theta \; | \; x_1, x_2, \dots, x_n)} {d\; \theta} \;\; = \;\; 0 \]

Although this equation can be solved to find a maximum likelihood estimate, it is often easier mathematically to maximise the logarithm of the likelihood function rather than the likelihood function itself.

\[ \ell(\theta) \;=\; \log L(\theta) \]

The following result explains why this is equivalent.

Maximising the log-likelihood

The maximum likelihood estimate of a parameter \(\theta\) can normally be found by solving the equation

\[ \frac {d\; \log L(\theta \; | \; x_1, x_2, \dots, x_n)} {d\; \theta} \;\; = \;\; 0 \]

Using the chain rule to differentiate \(\log L(\theta)\),

\[ \frac {d\; \log L(\theta)} {d\; \theta} \;\;=\;\; \frac 1 {L(\theta)} \times \frac {d\; L(\theta)} {d\; \theta} \]

Since the likelihood \(L(\theta)\) is a probability, its value is between 0 and 1. Therefore \({d\; L(\theta)} / {d\; \theta}\) can only be zero if \({d\; \log L(\theta)} / {d\; \theta}\) is zero.

We will now give a simple example.

A simple binomial example

Consider a random variable \(X\) that is the number of successes in \(n=20\) independent trials, each of which has probability \(\pi\) of success. If the experiment resulted in \(x=6\) successes, the likelihood function would be

\[ L(\pi) = {{20} \choose 6} \pi^6(1-\pi)^{20-6} \;\; = \;\;38,760 \; \times \pi^6(1-\pi)^{14} \]

The maximum likelihood estimator of \(\pi\) is the solution to the equation

\[ \frac {d \; L(\pi)} {d\; \pi} \;\; = \;\;38,760 \;\times \frac {d} {d\; \pi} \left(\pi^6(1-\pi)^{14} \right) = 0\]

This differentiation is relatively difficult, but the method can be simplified if we instead differentiate the log-likelihood,

\[ \ell(\pi) \;\; = \;\; \log L(\pi) \;\; = \;\; 6 \log(\pi) + 14 \log(1 - \pi) + K\]

where \(K\) is a constant that does not depend on \(\pi\). The maximum likelihood estimate can therefore be found by solving

\[ \frac {d \; \ell(\pi)} {d\; \pi} \;\; = \;\; \frac 6 {\pi} - \frac {14} {1 - \pi} \;\; = \;\; 0 \\[0.5em] 6(1-\pi) = 14\pi \\[0.5em] 6 = 20\pi \]

The maximum likelihood estimate of \(\pi\) is therefore \( \hat {\pi} = \frac 6 {20} \), the sample proportion of successes.


This can be generalised to a binomial experiment in which \(x\) successes are observed in \(n\) trials, so

\[ \ell(\pi) \; = \; \log L(\pi) \; = \; x \log(\pi) + (n-x) \log(1 - \pi) + K(n, x) \\[0.4em] \frac {d \; \ell(\pi)} {d\; \pi} \; = \; \frac x {\pi} - \frac {n-x} {1 - \pi} \; = \; 0 \]

which can be solved to give

\[ \hat {\pi} \;=\; \frac x n \]

The diagram below illustrates the method. The top half shows the likelihood function, whereas the bottom half shows the log-likelihood.

Drag the slider and observe that the value of \(\pi\) that maximises the likelihood is the same as the one that maximises the log-likelihood.

The green lines show the slopes of the two curves at the currently selected value of \(\pi\) — the derivatives of the two functions. Click Max likelihood to display the maximum likelihood estimator and observe that this is the value corresponding to zero slope (and first derivative) for both the likelihood function and the log-likelihood.

Note that we are using natural logarithms (base-e) here, not logarithms to the base 10. Some textbooks use the notation "\(\ln (x)\)" to denote a natural logarithm and Excel uses the function "=LN(x)", but we will use "\(\log (x)\)" here.