Test statistic to compare \(\mathcal{M}_S\) with \(\mathcal{M}_B\)

We can now formally test the hypotheses

We base this on a test statistic that is twice the logarithm of the likelihood ratio for the large and small models,

\[ X^2 \;\;=\;\; 2\log(R) \;\;=\;\; 2\left(\ell(\mathcal{M}_B) - \ell(\mathcal{M}_S)\right) \]

We explained on the previous page that this is likely to be largest if \(\mathcal{M}_S\) is not correct. The following result (stated without proof) shows that it has (approximately) a standard distribution when the null hypothesis holds.

Distribution of test statistic

If the data do come from \(L(\mathcal{M}_S)\), and \(L(\mathcal{M}_B)\) has \(k\) more parameters than \(L(\mathcal{M}_S)\),

\[ X^2 \;\;=\;\; 2\left( \ell(\mathcal{M}_B) - \ell(\mathcal{M}_S)\right) \;\;\underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k \text{ df}) \]

Likelihood ratio test

The test is therefore done by evaluating the test statistic, \(X^2\), from the data. The p-value is the probability of getting a value as large as this from the \(\ChiSqrDistn(k \text{ df})\) distribution.

The full procedure is described below:

  1. Find the maximum likelihood estimates of all unknown parameters in \(\mathcal{M}_B\).
  2. Find the maximum likelihood estimates of all unknown parameters in \(\mathcal{M}_S\).
  3. Evaluate the test statistic, \(\chi^2 = 2\left( \ell(\mathcal{M}_B) - \ell(\mathcal{M}_S)\right)\).
  4. The degrees of freedom for the test are the difference between the numbers of unknown parameters in the two models.
  5. The p-value for the test is the upper tail probability of the \(\ChiSqrDistn(k \text{ df})\) distribution above the test statistic.
  6. Interpret the p-value as for other kinds of hypothesis test — small values give evidence that the null hypothesis, model \(\mathcal{M}_S\), does not hold.

Example

The following table describes the number of defective items from a production line in each of 20 days.

1
2
3
4
2
3
2
5
5
2
4
3
5
1
2
4
0
2
2
6

Assuming that the data are a random sample from a \(\PoissonDistn(\lambda)\) distribution, use a likelihood ratio test for whether the rate of defects was \(\lambda = 2\) per week.

The hypotheses of interest are

The log-likelihood for the Poisson model is

\[ \ell(\lambda) \;\;=\;\; \left(\sum_{i=1}^{20} X_i\right) \log \lambda - n\lambda + K \]

where \(K\) is a constant that does not depend on \(\lambda\).

Big model, \(\mathcal{M}_B\)

For the big model, the maximum likelihood estimate of the unknown parameter is

\[ \hat{\lambda} \;\;=\;\; \frac{\sum x_i}{n} \;\;=\;\; 2.9 \]

The maximum possible value for the log-likelihood is

\[ \ell(\mathcal{M}_B) \;\;=\;\; 58 \log(2.9) - 20 \times 2.9 \;\;=\;\; 3.7532 + K \]

Small model, \(\mathcal{M}_S\)

There are no unknown parameters for the small model, so the maximum possible value for the log-likelihood is

\[ \ell(\mathcal{M}_S) \;\;=\;\; 58 \log(2) - 20 \times 2 \;\;=\;\; 0.2025 + K \]

Likelihood ratio test

The test statistic is

\[ X^2 \;\;=\;\; 2\left(\ell(\mathcal{M}_B) - \ell(\mathcal{M}_S)\right) \;\;=\;\; 7.101 \]

Since there is one more unknown parameter in the big model, this should be compared to the \(\ChiSqrDistn(1 \text{ df})\) distribution. Its upper tail probability above 7.101 is 0.008 and this is the p-value for the test.

Since the p-value is so small, we should conclude that there is strong evidence that \(\mathcal{M}_B\) fits the data better than \(\mathcal{M}_S\) — i.e. that \(\lambda \ne 2\).


Illustration of likelihood ratio test

The diagram below illustrates the test.

The bottom of the diagram shows a stacked dot plot of the data and initially superimposes the probabilities for a \(\PoissonDistn(\lambda=2)\) model for the data.

The top of the diagram shows the log-likelihood for different values of \(\lambda\). When \(lambda = 2\), the log-likelihood is 3.551 lower than its maximum possible value (which occurs at \(\hat{\lambda} = 2.9\)). The p-value is the probability of such a large difference and is found from the \(\ChiSqrDistn(1 \text{ df})\) distribution to be 0.0077. This provides very strong evidence that the underlying Poisson parameter is not 2.0.

Finally, use the slider to see how the results of the test would have changed if the null hypothesis value of the parameter, \(\lambda_0\), had been other values.