Comparing big and small models
We are interested in assessing whether the simpler "small" model fits the data or whether the more general "big" model is needed:
It is important to remember that the big model, \(\mathcal{M}_B\), always fits better than the small model, \(\mathcal{M}_S\) — it is more general and has more parameters to adjust.
What we need to consider is whether \(\mathcal{M}_B\) fits significantly better.
Measuring how well a model fits the data
The fit of any model can be described by the maximum possible likelihood for that model,
\[ L(\mathcal{M}) \;\;=\;\; \underset{\text{all models in }\mathcal{M}}{\operatorname{max}} P(data \mid model) \]This is obtained by calculating the maximum likelihood estimates of all unknown parameters and inserting them into the likelihood function.
Likelihood ratio
The model \(\mathcal{M}_S\) is a special case of model \(\mathcal{M}_B\), so \(L(\mathcal{M}_B)\) can always be made at least as big as \(L(\mathcal{M}_S)\). The maximised likelihood is therefore at least as large for the large model as for the small model,
\[ L(\mathcal{M}_B) \;\;\ge\;\; L(\mathcal{M}_S) \]Equivalently, the likelihood ratio is always at least one,
\[ R \;\;=\;\; \frac{L(\mathcal{M}_B)}{L(\mathcal{M}_S)} \;\;\ge\;\; 1 \]Big values of \(R\) suggest that \(\mathcal{M}_S\) does not fit as well as \(\mathcal{M}_B\).
Log of likelihood ratio
Equivalently, taking logs of this inequality,
\[ \log(R) \;\;=\;\; \ell(\mathcal{M}_B) - \ell(\mathcal{M}_S) \;\;\ge\;\; 0 \]Again, big values suggest that \(\mathcal{M}_S\) does not fit as well as \(\mathcal{M}_B\).