Counts with Poisson distributions
Suppose that we have \(n\) independent discrete random variables \(\{X_1, X_2,\dots, X_k\}\) that are counts of events. We will now consider whether they might be counts from Poisson processes in which the rate of events for \(X_i\) is \(\lambda_i\),
\[ X_i \;\;\sim\;\; \PoissonDistn(\lambda_i) \]If this model holds then, from the properties of Poisson distributions,
\[ E[X_i] \;\;=\;\; \Var(X_i) \;\;=\;\; \lambda_i \]Standardised counts
If the counts have these Poisson distributions, we can define standardised versions
\[ Z_i \;\;=\;\; \frac{X_i - \lambda_i}{\sqrt{\lambda_i}} \]that will have mean zero and standard deviation one. If the \(\{\lambda_i\}\) are large, the counts and their standardised versions will also be approximately normally distributed.
\[ Z_i \;\;=\;\; \frac{X_i - \lambda_i}{\sqrt{\lambda_i}} \;\; \underset{\text{approx}}{\sim} \;\; \NormalDistn(0,1) \]Chi-squared statistic
Since the sum of squared standard normal variables has a chi-squared distribution, if the Poisson model holds,
\[ \sum_{i=1}^k {Z_i^2} \;\;=\;\; \sum_{i=1}^k {\frac{\left(X_i - \lambda_i\right)^2}{\lambda_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k \text{ df}) \]In the context of goodness-of-fit tests, we often denote the observed counts by \(\{O_i\}\) instead of \(\{X_i\}\) and their expected values by \(\{E_i\}\) instead of \(\{\lambda_i\}\), so the above approximation is denoted by
\[ X^2 \;\;=\;\; \sum_{i=1}^k {\frac{\left(O_i - E_i\right)^2}{E_i}} \;\; \underset{\text{approx}}{\sim} \;\; \ChiSqrDistn(k \text{ df}) \]In practice, this approximation is reasonable provided most of the \(\{E_i\}\) — the Poisson means — are reasonably large. The usual guideline is that
If these guidelines are not met, the chi-squared distribution should not be used to find probabilities related to \(X^2\).