Model for overdispersion
The Poisson distribution only has a single parameter, so its variance cannot be increased without simultaneously increasing the distribution's mean.
Definition
A random variable, \(X\), is said to have a generalised negative binomial distribution
\[ X \;\;\sim\;\; \NegBinDistn(\kappa, \pi) \]if its probability function is
\[ p(x) \;\;=\;\; \begin{cases} \displaystyle \frac{\Gamma(\kappa + x)}{x! \; \Gamma(\kappa)} \pi^{\kappa} (1-\pi)^x &\text{for }x = 0, 1, 2, \dots \\[0.2em] 0 & \text{otherwise} \end{cases} \]where \(\kappa \gt 1\) and \(0 \le \pi \lt 1\). The two parameters \(\kappa\) and \(\pi\) are usually two unknown.
This is a generalisation of the negative binomial distribution that was described earlier for the number failures before the \(\kappa\)'th success in a sequence of independent success/failure trials with probability \(\pi\) of success, but we now allow non-integer values of \(\kappa\). It is often simply called a negative binomial distribution.
(To see that the distribution is the same as the earlier negative binomial distribution, notice that \(\kappa\) was an integer, allowing the gamma functions to be replaced by factorials.)
Properties
We now describe some properties of the negative binomial distribution that explain why it works as a model for overdispersed count data.
Mean and variance
The mean and variance of the generalised negative binomial distribution are
\[ E[X] = \frac {\kappa(1-\pi)} \pi \spaced{and} \Var(X) = \frac {\kappa(1-\pi)} {\pi^2} \]These are the same formulae that were derived earlier for integer \(\kappa\). We will not give the more general proof here.
From these formulae,
\[ \Var(X) \;=\; E[X] \times \frac 1 {\pi} \]Since \(\pi\) is less than 1, the variance of the distribution is always greater than its mean, a characteristic of overdispersion.
The next result is also stated without proof.
Asymptotic distribution
If \(\kappa \to \infty\) and \(\pi \to 1\) simultaneously with \(\displaystyle \frac{\kappa(1-\pi)}{\pi} = \lambda\), the negative binomial distribution approaches a \(\PoissonDistn(\lambda)\) distribution.
This shows that the negative binomial distribution can be made arbitrarily close to a Poisson distribution with appropriate choice of \(\kappa\) and \(\pi\).
Justification for the negative binomial distribution
We will mainly treat the negative binomial distribution simply as a more flexible model for counts that can be used to model overdispersion. There are however two ways that a negative binomial distribution can be theoretically derived. (The details are complex and we will simply give a flavour here.)
We can however simply use the distribution as an empirical model with the flexibility to model overdispersed count data.
Shape of generalised negative binomial distribution
The diagram below initially shows the probability function for a Poisson distribution; the top slider can be used to adjust its mean, \(\lambda\).
For any Poisson distribution, the mean \(E[X]\) is equal to the variance \(\Var(X)\). The generalised negative binomial has a second parameter that allows more flexibility. Drag the bottom slider to make the variance a greater multiple of the mean. Observe that whatever the distribution's mean,
Clicking Show zero-one axis to deselect it makes the bar charts fill up more of the vertical height in the diagram, making these features clearer.