Likelihood function

We defined the likelihood function of a discrete data set to be the probability of obtaining these data values, treated as a function of the unknown parameter, \(\theta\).

\[ L(\theta) \;=\; P(data \;| \; \theta) \]

If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a discrete distribution with probability function \(p(x \mid \theta)\), this is

\[ L(\theta) \;=\; P(X_1 = x_1, X_2 = x_2, ..., X_n = x_n \;| \; \theta) \;\;=\;\; \prod_{i=1}^n {p(x_i \;| \; \theta)} \]

If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a continuous distribution with probability density function \(f(x\;|\; \theta)\), we define the likelihood function in a similar way, based on the probability of getting values that are close to the observed data. We showed earlier that

\[ P(X_1 \approx x_1, X_2 \approx x_2, ..., X_n \approx x_n) \;\; \propto \;\; \prod_{i=1}^n f(x_i) \]

Therefore the product of the probability density functions plays the same role for continuous random variables as the product of probability functions for discrete ones.

Definition

If random variables \(\{X_1, X_2, \dots, X_n\}\) are a random sample from a continuous distribution with probability density function \(f(x \;|\; \theta)\), then the function

\[ L(\theta) = \prod_{i=1}^n {f(x_i \;| \; \theta)} \]

is called the likelihood function of \(\theta\).

Maximum likelihood estimate

The maximum likelihood estimate of \(\theta\) is again the value for which the observed data are most likely — the value that maximises \(L(\theta)\).

This is usually (but not always) a turning point of the likelihood function and can be found as the solution of the equation

\[ L'(\theta) \;\; =\;\; 0 \]

As with discrete distributions, it is usually easier to solve the equivalent equation involving the logarithm of likelihood function

\[ \ell'(\theta) \;\; =\;\; \frac d {d \theta} \log\big(L(\theta)\big) \;\; =\;\; 0 \]