In practical situations involving multinomial distributions, the probabilities of the underlying categories are unknown and must be estimated from data.
We therefore consider estimating \(\pi_1\), ..., \(\pi_g\) from a single multinomial observation of
\[ (X_1,\dots,X_g) \;\;\sim\;\; \MultinomDistn(n, \pi_1, ..., \pi_g) \]Likelihood function
As with univariate distributions, the likelihood function is the probability of observing the data, treated as a function of the unknown parameters. This is just the data's joint probability function,
\[ L(\pi_1, ..., \pi_g \mid x_1,\dots,x_g) \;\;=\;\; \frac{n!}{x_1!\;x_2!\; \cdots,\;x_g!} \pi_1^{x_1}\pi_2^{x_2}\cdots \pi_g^{x_g} \]We can eliminate one of the unknown parameters here. Since the \(g\) probabilities sum to one,
\[ \pi_g \;\;=\;\; 1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1} \]We will therefore rewrite the likelihood as
\[ L(\pi_1, ..., \pi_{g-1}) \;\;=\;\; \frac{n!}{x_1!\;x_2!\; \cdots,\;x_g!} \pi_1^{x_1}\pi_2^{x_2}\cdots \pi_{g-1}^{x_{g-1}} (1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1})^{x_g} \]The log-likelihood is
\[ \begin{align} \ell(\pi_1, ..., \pi_{g-1}) \;\;=\;\; x_1 \log(\pi_1) + \cdots &+ x_{g-1} \log(\pi_{g-1})\\[0.4em] &+ x_g \log(1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1}) + K \end{align} \]where \(K\) is a constant whose value does not depend on the unknown parameters.
Maximum likelihood estimates
If \((x_1, x_2, \dots, x_g)\) are a random sample from a \(\MultinomDistn(n, \pi_1, \dots, \pi_g)\) distribution, the maximum likelihood estimates of \(\pi_1, \dots, \pi_g\) are
\[ \hat{\pi}_i \;\;=\;\; \frac{x_i}{n} \]The maximum likelihood estimates are the parameter values that maximise the log-likelihood and are the solutions to
\[ \frac{\partial \ell}{\partial \pi_i} \;\;=\;\; \frac{x_i}{\pi_i} - \frac{x_g}{1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1}} = 0 \qquad\text{for } i=1,\dots,g-1 \]If we replace \((1 - \pi_1 - \pi_2 - \cdots - \pi_{g-1})\) by \(\pi_g\) in this equation to get
\[ \frac{x_i}{\pi_i} - \frac{x_g}{\pi_g} = 0 \]then it can be easily seen that \(\displaystyle \hat{\pi}_i = \frac{x_i}{n}\) is a solution.
The best estimates of the probabilities of the \(g\) different values are simply their sample proportions.