Constraints on the expected counts

For the test on the previous page, the null hypothesis must specify all of the Poisson distribution means, \(\{E_i\}\) (e.g. \(\lambda = 10\) in the example). In most situations, the null hypothesis cannot provide values for the \(\{E_i\}\) so our null hypothesis Poisson model involves some parameters that must be estimated from the data. For example,

Estimating \(\lambda\) from the data makes the \(\{E_i\}\) closer to the \(\{O_i\}\) than arises when \(\lambda\) is a known value and this makes the chi-squared test statistic smaller. The general result is that:

\[ X^2 \;\;=\;\; \sum_{i=1}^k {\frac{\left(O_i - E_i\right)^2}{E_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k - c \text{ df}) \]

when there are \(c\) 'constraints' on the \(\{E_i\}\). These constraints are often unknown parameters that are estimated from the data. An example shows how this affects the goodness-of-fit test.

Example

This table shows the number of heart attacks in a city in each of ten weeks.

Week 1 2 3 4 5 6 7 8 9 10
Count   6 11 13 10 21 8 16 6 9 19

Test whether the heart attacks have occurred at random with a constant rate over this period.

If the heart attacks occurred at random, the counts would all have Poisson distributions,

\[ O_i \;\;\sim\;\; \PoissonDistn(\lambda) \]

so the expected counts, \(\{E_i\}\) are all equal to this unknown parameter, \(\lambda\). To apply the chi-squared test, we must therefore estimate \(\lambda\) from the data. The best estimate (both the method of moments and maximum likelihood estimators) is the sample mean count,

\[ \hat{\lambda} \;\;=\;\; \overline{O} \;\;=\;\; 11.9 \]

When we now evaluate our test statistic, the \(\{E_i\}\) will be closer to the \(\{O_i\}\) because we have estimated \(\lambda\) to be a value that matches what we have in the data. This acts as a constraint on the \(\{E_i\}\). In fact, it can be shown that the \(\{E_i\}\) must now satisfy:

\[ \sum{E_i} \;\;=\;\; \sum{O_i} \]

Our test statistic is now

\[ X^2 \;=\; \sum_{i=1}^{10} {\frac{\left(O_i - E_i\right)^2}{E_i}} \;=\; \frac{(6-11.9)^2}{11.9} + \frac{(11-11.9)^2}{11.9} + \cdots \;=\; 20.9\]

and we should compare this to the \(\ChiSqrDistn(9 \text{ df})\) distribution (10 counts and 1 constraint). From this distribution, the p-value is

\[ P(X^2 \ge 20.9) \;\;=\;\; 0.0130\]

From such a small p-value, we would conclude that there is moderately strong evidence that the Poisson model does not fit — the heart attacks do not seem to be occurring randomly as a Poisson process with the same rate in each week.