Does a Poisson model hold?

The chi-squared distribution on the previous page was based on the assumption that the counts followed Poisson distributions and that their means had been correctly specified. It can therefore be used as the basis for a hypothesis test about whether these assumptions are correct.

Test statistic

The chi-squared statistic from the previous page

\[ X^2 \;\;=\;\; \sum_{i=1}^k {\frac{\left(O_i - E_i\right)^2}{E_i}} \]

can be used as a test statistic.

P-value and conclusion

Large values of the test statistic suggest that the alternative hypothesis holds, so the p-value is the probability that the test statistic, \(X^2\), is as large as was recorded from the actual data, \(x^2\),

\[ \text{p-value} \;\;=\;\; P(X^2 \ge x^2) \]

when the null hypothesis is true. This can be found from the upper tail of the \(\ChiSqrDistn(k \text{ df})\) distribution.

Example

The following table describes the number of heart attacks in a city in ten weeks.

Week 1 2 3 4 5 6 7 8 9 10
Count   6 11 13 10 21 8 16 6 9 19

Test whether the heart attacks occurred at random with a rate of \(\lambda = 10\) per week.

If the heart attacks occurred at random with a constant rate of \(\lambda = 10\) per week, the counts would all have Poisson distributions,

\[ O_i \;\;\sim\;\; \PoissonDistn(10) \]

with expected counts \(E_i = 10\). The test statistic is

\[ X^2 \;=\; \sum_{i=1}^10 {\frac{\left(O_i - E_i\right)^2}{E_i}} \;=\; \frac{(6-10)^2}{10} + \frac{(11-10)^2}{10} + \cdots \;=\; 28.5\]

The p-value for the test is found from the upper tail probability of the \(\ChiSqrDistn(10 \text{ df})\) distribution,

\[ P(X^2 \ge 28.5) \;\;=\;\; 0.0015\]

Such a large test statistic would be extremely unlikely if the model was correct, so we conclude that there is extremely strong evidence that the model is incorrect.

The problem with the model could be either

The first of these problems would result in counts that were systematically high or low. Overdispersion would be indicated by high week-to-week variation in the counts. Further analysis is needed to decide the reason for the lack of fit of the model.