Likelihood function
Unlike the method of moments, the concept of maximum likelihood estimation can be used in the same way with models that have any number of parameters. We will describe the method for a model with two unknown parameter, \(\theta\) and \(\phi\), but it should be clear how to extend it to three or more parameters.
The concept of the likelihood function is the same as for models with a single unknown parameter. If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a discrete distribution with probability function \(p(x\;|\; \theta, \phi)\), the likelihood function gives the probability of getting the observed data for any values of the parameters.
\[ L(\theta, \phi \; | \; x_1, x_2, \dots, x_n) \;\;=\;\; p(x_1, x_2, \dots, x_n \;| \; \theta, \phi) \;\;=\;\; \prod_{i=1}^n {p(x_i\;|\; \theta, \phi)} \]For a random sample from a continuous distribution, the corresponding definition is
\[ L(\theta, \phi \; | \; x_1, x_2, \dots, x_n) \;\;=\;\; \prod_{i=1}^n {f(x_i\;|\; \theta, \phi)} \]Maximising the likelihood
Maximum likelihood again chooses the parameter values for which the observed data have the highest probability of being observed.
Definition
If a random variable \(X\) has a distribution that involves two unknown parameters, \(\theta\) and \(\phi\), the maximum likelihood estimates of the parameters are the values that maximise the likelihood function.
The likelihood function is usually maximised at a turning point of the likelihood function and could therefore be found by setting the partial derivatives of \(L(\theta, \phi)\) with respect to \(\theta\) and \(\phi\) to zero.
\[ \frac{\partial L(\theta, \phi)}{\partial \theta} = 0 \spaced{and} \frac{\partial L(\theta, \phi)}{\partial \phi} = 0 \]giving two equations that can be solved for \(\theta\) and \(\phi\). The parameter values that maximise the likelihood also maximise its logarithm and it is usually easier to work with the log-likelihood, \(\ell(\theta, \phi) = \log L(\theta, \phi)\), so we usually solve the equations
\[ \frac{\partial \ell(\theta, \phi)}{\partial \theta} = 0 \spaced{and} \frac{\partial \ell(\theta, \phi)}{\partial \phi} = 0 \]instead, giving identical parameter estimates.