In practical situations involving multinomial distributions, the probabilities of the underlying categories are unknown and must be estimated from data.

We therefore consider estimating \(\pi_1\), ..., \(\pi_g\) from a single multinomial observation of

\[ (X_1,\dots,X_g) \;\;\sim\;\; \MultinomDistn(n, \pi_1, ..., \pi_g) \]

Likelihood function

As with univariate distributions, the likelihood function is the probability of observing the data, treated as a function of the unknown parameters. This is just the data's joint probability function,

\[ L(\pi_1, ..., \pi_g \mid x_1,\dots,x_g) \;\;=\;\; \frac{n!}{x_1!\;x_2!\; \cdots,\;x_g!} \pi_1^{x_1}\pi_2^{x_2}\cdots \pi_g^{x_g} \]

We can eliminate one of the unknown parameters here. Since the \(g\) probabilities sum to one,

\[ \pi_g \;\;=\;\; 1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1} \]

We will therefore rewrite the likelihood as

\[ L(\pi_1, ..., \pi_{g-1}) \;\;=\;\; \frac{n!}{x_1!\;x_2!\; \cdots,\;x_g!} \pi_1^{x_1}\pi_2^{x_2}\cdots \pi_{g-1}^{x_{g-1}} (1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1})^{x_g} \]

The log-likelihood is

\[ \begin{align} \ell(\pi_1, ..., \pi_{g-1}) \;\;=\;\; x_1 \log(\pi_1) + \cdots &+ x_{g-1} \log(\pi_{g-1})\\[0.4em] &+ x_g \log(1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1}) + K \end{align} \]

where \(K\) is a constant whose value does not depend on the unknown parameters.

Maximum likelihood estimates

If \((x_1, x_2, \dots, x_g)\) are a random sample from a \(\MultinomDistn(n, \pi_1, \dots, \pi_g)\) distribution, the maximum likelihood estimates of \(\pi_1, \dots, \pi_g\) are

\[ \hat{\pi}_i \;\;=\;\; \frac{x_i}{n} \]

The maximum likelihood estimates are the parameter values that maximise the log-likelihood and are the solutions to

\[ \frac{\partial \ell}{\partial \pi_i} \;\;=\;\; \frac{x_i}{\pi_i} - \frac{x_g}{1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1}} = 0 \qquad\text{for } i=1,\dots,g-1 \]

If we replace \((1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1})\) by \(\pi_g\) in this equation to get

\[ \frac{x_i}{\pi_i} - \frac{x_g}{\pi_g} = 0 \]

then it can be easily seen that \(\displaystyle \hat{\pi}_i = \frac{x_i}{n}\) is a solution.

The best estimates of the probabilities of the \(g\) different values are simply their sample proportions.