Comparing big and small models

We are interested in assessing whether the simpler "small" model fits the data or whether the more general "big" model is needed:

It is important to remember that the big model, \(\mathcal{M}_B\), always fits better than the small model, \(\mathcal{M}_S\) — it is more general and has more parameters to adjust.

What we need to consider is whether \(\mathcal{M}_B\) fits significantly better.

Measuring how well a model fits the data

The fit of any model can be described by the maximum possible likelihood for that model,

\[ L(\mathcal{M}) \;\;=\;\; \underset{\text{all models in }\mathcal{M}}{\operatorname{max}} P(data \mid model) \]

This is obtained by calculating the maximum likelihood estimates of all unknown parameters and inserting them into the likelihood function.

Likelihood ratio

The model \(\mathcal{M}_S\) is a special case of model \(\mathcal{M}_B\), so \(L(\mathcal{M}_B)\) can always be made at least as big as \(L(\mathcal{M}_S)\). The maximised likelihood is therefore at least as large for the large model as for the small model,

\[ L(\mathcal{M}_B) \;\;\ge\;\; L(\mathcal{M}_S) \]

Equivalently, the likelihood ratio is always at least one,

\[ R \;\;=\;\; \frac{L(\mathcal{M}_B)}{L(\mathcal{M}_S)} \;\;\ge\;\; 1 \]

Big values of \(R\) suggest that \(\mathcal{M}_S\) does not fit as well as \(\mathcal{M}_B\).

Log of likelihood ratio

Equivalently, taking logs of this inequality,

\[ \log(R) \;\;=\;\; \ell(\mathcal{M}_B) - \ell(\mathcal{M}_S) \;\;\ge\;\; 0 \]

Again, big values suggest that \(\mathcal{M}_S\) does not fit as well as \(\mathcal{M}_B\).