Model for overdispersion
The Poisson distribution's variance always equals its mean. Another distribution with two parameters is needed to allow the variance to be greater than the mean.
Definition
A random variable, \(X\), is said to have a generalised negative binomial distribution
\[ X \;\;\sim\;\; \NegBinDistn(\kappa, \pi) \]if its probability function is
\[ p(x) \;\;=\;\; \begin{cases} \displaystyle \frac{\Gamma(\kappa + x)}{x! \; \Gamma(\kappa)} \pi^{\kappa} (1-\pi)^x &\text{for }x = 0, 1, 2, \dots \\[0.2em] 0 & \text{otherwise} \end{cases} \]where \(\kappa \gt 1\) and \(0 \le \pi \lt 1\). The two parameters \(\kappa\) and \(\pi\) are usually two unknown.
This is a generalisation of the negative binomial distribution for the number failures before the \(\kappa\)'th success in a sequence of independent success/failure trials with probability \(\pi\) of success, but we now allow non-integer values of \(\kappa\).
Mean and variance
The mean and variance of the generalised negative binomial distribution are
\[ E[X] = \frac {\kappa(1-\pi)} \pi \spaced{and} \Var(X) = \frac {\kappa(1-\pi)} {\pi^2} \](Proved in full version)
From these formulae,
\[ \Var(X) \;=\; E[X] \times \frac 1 {\pi} \]Since \(\pi \lt 1\), \(\Var(X) \gt E[X]\), allowing it to model data with overdispersion.
Asymptotic distribution
If \(\kappa \to \infty\) and \(\pi \to 1\) simultaneously with \(\displaystyle \frac{\kappa(1-\pi)}{\pi} = \lambda\), the negative binomial distribution approaches a \(\PoissonDistn(\lambda)\) distribution.
(Not proved)
This shows that the negative binomial distribution can be made arbitrarily close to a Poisson distribution with appropriate choice of \(\kappa\) and \(\pi\).
Justification for the negative binomial distribution
The negative binomial distribution is often simply used as an empirical model with the flexibility to model overdispersed count data.. However it can also be theoretically derived in two different ways:
Shape of distribution
We now illustrate the additional flexibility provided by the negative binomial.
The top diagram shows the distribution of the number of events in a homogeneous Poisson process in which \(E[X] = \Var(X) = 3\).
The two negative binomial distributions also have \(E[X] = 3\), but \(\Var(X) \gt 3\), so they might be used to fit over-dispersed counts with more zeros and high values than would be expected from a homogeneous Poisson process.