Counts with Poisson distributions

Suppose that we have \(n\) independent discrete random variables \(\{X_1, X_2,\dots, X_k\}\) that are counts of events. We will now consider whether they might be counts from Poisson processes in which the rate of events for \(X_i\) is \(\lambda_i\),

\[ X_i \;\;\sim\;\; \PoissonDistn(\lambda_i) \]

If this model holds then, from the properties of Poisson distributions,

\[ E[X_i] \;\;=\;\; \Var(X_i) \;\;=\;\; \lambda_i \]

Standardised versions of these counts

\[ Z_i \;\;=\;\; \frac{X_i - \lambda_i}{\sqrt{\lambda_i}} \]

will have mean zero and standard deviation one. If the \(\{\lambda_i\}\) are large,

\[ Z_i \;\;=\;\; \frac{X_i - \lambda_i}{\sqrt{\lambda_i}} \;\; \underset{\text{approx}}{\sim} \;\; \NormalDistn(0,1) \]

Chi-squared statistic

If the Poisson model holds,

\[ \sum_{i=1}^k {Z_i^2} \;\;=\;\; \sum_{i=1}^k {\frac{\left(X_i - \lambda_i\right)^2}{\lambda_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k \text{ df}) \]

In the context of goodness-of-fit tests, this is often written as

\[ X^2 \;\;=\;\; \sum_{i=1}^k {\frac{\left(O_i - E_i\right)^2}{E_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k \text{ df}) \]

In practice, this approximation is reasonable provided most of the \(\{E_i\}\) — the Poisson means — are reasonably large. The usual guideline is that

If these guidelines are not met, the chi-squared distribution should not be used to find probabilities related to \(X^2\).