Unlike the method of moments, maximum likelihood estimation can be used for models with any number of unknown parameters. We will describe the method for a model with two unknown parameter, \(\theta\) and \(\phi\), but it should be clear how to extend it to three or more parameters.

If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a discrete distribution with probability function \(p(x\;|\; \theta, \phi)\), the likelihood function is again the probability of getting the observed data for any values of the parameters.

\[ L(\theta, \phi \; | \; x_1, x_2, \dots, x_n) \;\;=\;\; p(x_1, x_2, \dots, x_n \;| \; \theta, \phi) \;\;=\;\; \prod_{i=1}^n {p(x_i\;|\; \theta, \phi)} \]

For a continuous distribution, the corresponding definition is

\[ L(\theta, \phi \; | \; x_1, x_2, \dots, x_n) \;\;=\;\; \prod_{i=1}^n {f(x_i\;|\; \theta, \phi)} \]

Maximising the likelihood

Definition

If a random variable \(X\) has a distribution that involves two unknown parameters, \(\theta\) and \(\phi\), the maximum likelihood estimates of the parameters are the values that maximise the likelihood function.

This is usually at a turning point of the likelihood function — where the partial derivatives of \(L(\theta, \phi)\) with respect to \(\theta\) and \(\phi\) are zero,

\[ \frac{\partial L(\theta, \phi)}{\partial \theta} = 0 \spaced{and} \frac{\partial L(\theta, \phi)}{\partial \phi} = 0 \]

Solving these equations give MLEs for \(\theta\) and \(\phi\). Equivalently, writing \(\ell(\theta, \phi) = \log L(\theta, \phi)\), we can solve the equations

\[ \frac{\partial \ell(\theta, \phi)}{\partial \theta} = 0 \spaced{and} \frac{\partial \ell(\theta, \phi)}{\partial \phi} = 0 \]

This is usually easier and gives identical parameter estimates.