Constraints on the expected counts
For the test on the previous page, the null hypothesis must specify all of the Poisson distribution means, \(\{E_i\}\) (e.g. \(\lambda = 10\) in the example). In most situations, the null hypothesis cannot provide values for the \(\{E_i\}\) so our null hypothesis Poisson model involves some parameters that must be estimated from the data. For example,
Estimating \(\lambda\) from the data makes the \(\{E_i\}\) closer to the \(\{O_i\}\) than arises when \(\lambda\) is a known value and this makes the chi-squared test statistic smaller. The general result is that:
\[ X^2 \;\;=\;\; \sum_{i=1}^k {\frac{\left(O_i - E_i\right)^2}{E_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k - c \text{ df}) \]when there are \(c\) 'constraints' on the \(\{E_i\}\). These constraints are often unknown parameters that are estimated from the data. An example shows how this affects the goodness-of-fit test.
Example
This table shows the number of heart attacks in a city in each of ten weeks.
Week | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Count | 6 | 11 | 13 | 10 | 21 | 8 | 16 | 6 | 9 | 19 |
Test whether the heart attacks have occurred at random with a constant rate over this period.
If the heart attacks occurred at random, the counts would all have Poisson distributions,
\[ O_i \;\;\sim\;\; \PoissonDistn(\lambda) \]so the expected counts, \(\{E_i\}\) are all equal to this unknown parameter, \(\lambda\). To apply the chi-squared test, we must therefore estimate \(\lambda\) from the data. The best estimate (both the method of moments and maximum likelihood estimators) is the sample mean count,
\[ \hat{\lambda} \;\;=\;\; \overline{O} \;\;=\;\; 11.9 \]When we now evaluate our test statistic, the \(\{E_i\}\) will be closer to the \(\{O_i\}\) because we have estimated \(\lambda\) to be a value that matches what we have in the data. This acts as a constraint on the \(\{E_i\}\). In fact, it can be shown that the \(\{E_i\}\) must now satisfy:
\[ \sum{E_i} \;\;=\;\; \sum{O_i} \]Our test statistic is now
\[ X^2 \;=\; \sum_{i=1}^{10} {\frac{\left(O_i - E_i\right)^2}{E_i}} \;=\; \frac{(6-11.9)^2}{11.9} + \frac{(11-11.9)^2}{11.9} + \cdots \;=\; 20.9\]and we should compare this to the \(\ChiSqrDistn(9 \text{ df})\) distribution (10 counts and 1 constraint). From this distribution, the p-value is
\[ P(X^2 \ge 20.9) \;\;=\;\; 0.0130\]From such a small p-value, we would conclude that there is moderately strong evidence that the Poisson model does not fit — the heart attacks do not seem to be occurring randomly as a Poisson process with the same rate in each week.