Random sample

If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a distribution with probability function \(p(x\;|\;\theta)\), then

\[ L(\theta \;|\;x_1, x_2, \dots, x_n) = \prod_{i=1}^n p(x_i \;|\; \theta) \]

so the log-likelihood can be written as

\[ \ell(\theta) = \sum_{i=1}^n \log\left(p(x_i \;|\; \theta)\right) \]

We now give two examples in which maximum likelihood estimates are found from random samples.

Example: Sex ratio of Siberian tigers

The probability of a newborn tiger being male is an unknown parameter, \(\pi\). A researcher recorded the number of males in a sample of \(n = 207\) litters, and these values are summarised in the following frequency table.

Number of males 0 1 2 3
Frequency 33 66 80 28

If it is assumed that the sexes of all tigers in a litter are independently determined, what is the maximum likelihood estimate of \(\pi\)?

The number of male tigers in a litter of size \(k\) is a binomial random variable with probability function

\[ p(x) = {k \choose x} \pi^x(1-\pi)^{k-x} \]

The data are a random sample of \(n = 207\) values from this distribution, \(x_1, x_2, \dots, x_{207} \), so they have likelihood function

\[ \begin{align} L(\pi) & = \prod_{i=1}^n {{k \choose x_i} \pi^{x_i}(1-\pi)^{k-x_i}} \\ & = \pi^{\sum {x_i}}(1-\pi)^{nk-\sum {x_i}} \times \prod_{i=1} {k \choose x_i} \end{align} \]

The log-likelihood is

\[ \ell(\pi) \;\;= \;\; \sum {x_i} \log(\pi) + \left(nk - \sum(x_i) \right) \log(1 - \pi) + K \]

where \(K\) is a constant that does not depend on \(\pi\). The maximum likelihood estimate can be found by setting the derivative of \(\ell(\pi)\) to zero,

\[ \frac {d \; \ell(\pi)} {d\; \pi} \;\; = \;\; \frac {\sum {x_i}} {\pi} - \frac {nk - \sum {x_i}} {1 - \pi} \;\; = \;\; 0 \]

The solution to this equation gives the maximum likelihood estimate,

\[\hat{\pi} = \frac {\sum {x_i}} {nk} = \frac {\overline x} k = \frac {1.498} 3 = 0.499\]

For random samples from a binomial distribution, the maximum likelihood and method of moments estimators are equal. They are also equal for the next example, but differ for data from some other models.

Example: Sample from a geometric distribution

If \(\{X_1, X_2, \dots, X_n\}\) is a random sample from a geometric distribution with probability function

\[ p(x) = \pi (1-\pi)^{x-1} \quad \quad \text{for } x = 1, 2, \dots \]

what is the maximum likelihood estimate of \(\pi\)?

The likelihood function is

\[ L(\pi) = \prod_{i=1}^n {\pi(1-\pi)^{x_i - 1}} = \pi^n(1-\pi)^{\sum {x_i} - n} \]

and the log-likelihood is

\[ \ell(\pi) \;=\; n \log (\pi) + \left( \sum {x_i} - n \right) \log(1-\pi) \]

The maximum likelihood estimate is found by solving

\[ \frac {d \; \ell(\pi)} {d\; \pi} \;\; = \;\; \frac n {\pi} - \frac {\sum {x_i} - n} {1 - \pi} \;\; = \;\; 0 \\ n - n\pi - \left(\sum {x_i}\right) \pi + n\pi \;\;=\;\; 0 \\ \pi \;\;=\;\; \frac n {\sum {x_i}} \;\;=\;\; \frac 1 {\overline x} \]

The maximum likelihood estimator of the probability of success is therefore the inverse of the average number of trials until the first success.

The following diagram helps to illustrate the concepts with a numerical example.

Illustration

The diagram below is based on an artificial data set {1, 1, 1, 1, 2, 2, 4} which is assumed to be a random sample from a geometric distribution,

\[ X \;\; \sim \; \; \GeomDistn(\pi) \]

The seven data values are displayed as red crosses in the bottom half of the diagram below.

The bottom half of the diagram also displays a bar chart of the probabilities from a geometric distribution, initially with \(\pi = 0.5\).

The likelihood for this value or \(\pi\) is the product of the geometric probabilities for each of the 7 data values — the product of their bar heights. The log-likelihood is the sum of the log-probabilities for the values and is shown in the top half of the diagram.

Use the slider to investigate how \(\pi\) affects the log-likelihood, \(\ell(\pi)\). Observe that:

Finally click Max likelihood to show the maximum likelihood estimator,

\[ \hat{\pi} = \frac 1 {\overline{X}} = 0.583 \]