Problems evaluating maximum likelihood estimates

For some families of two-parameter distributions, it is difficult to find maximum likelihood estimates algebraically.

A numerical method must then be used to evaluate the maximum likelihood estimates.

Grid search

There are sometimes better algorithms, but a simple method would be to evaluate the log-likelihood over a grid of values of the two parameters, letting us identify the approximate values of the parameters that maximise the log-likelihood.

The grid of parameter values can then be refined to focus on a narrower range of possible parameter values. The method is more easily explained in an example.

Beta distribution

The following data set contains proportions between zero and one:

0.078 0.713 0.668 0.621 0.069 0.378 0.735 0.255 0.220 0.220
0.136 0.413 0.516 0.183 0.724 0.377 0.409 0.403 0.042 0.692
0.486 0.421 0.358 0.236 0.654 0.717 0.520 0.266 0.520 0.641

A reasonable distribution that could be used to model the data would be a beta distribution with probability density function

\[ f(x) \;\;=\;\; \begin{cases} \dfrac {\Gamma(\alpha +\beta) }{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}& \text{if }0 \lt x \le 1 \\ 0 & \text{otherwise} \end{cases} \]

We will estimate \(\alpha\) and \(\beta\) by maximum likelihood. The beta distribution's log-likelihood is

\[ \begin{align} \ell(\alpha, \beta) \;=\; n \log \Gamma(\alpha + \beta) &- n \log \Gamma(\alpha) - n \log \Gamma(\beta) \\ &+ (\alpha - 1) \sum(\log(x_i) + (\beta - 1)\sum \log(1 - x_i) \end{align} \]

Using the values in this data set, we therefore want to maximise

\[ \ell(\alpha, \beta) \;=\; 30 \log \Gamma(\alpha + \beta) - 30 \log \Gamma(\alpha) - 30 \log \Gamma(\beta) -31.89 (\alpha - 1) - 18.75 (\beta - 1) \]

with respect to \(\alpha\) and \(\beta\). Differentiating with respect to \(\alpha\) and \(\beta\) requires the derivative of the log-gamma function and the resulting equations cannot be solved algebraically.

The following Excel spreadsheet shows how the log-likelihood can be evaluated for a grid of values of the two parameters. The formula in cell C7 evaluates the log-likelihood for \(\alpha = 1\) and \(\beta = 2\). When written in this way, the cell can be copied into the other cells in the table to evaluate the log-likelihood for all other combinations of parameter values in the grid.

From these log-likelihoods, the maximum is at \(\alpha \approx 1.8\) and \(\beta \approx 2.6\).

α
1 1.2 1.4 1.6 1.8 2 2.2 2.4
β 2 2.05 4.00 4.86 4.88 4.26 3.12 1.53 -0.41
2.2 1.16 3.55 4.82 5.23 4.97 4.16 2.90 1.26
2.4 0.02 2.82 4.47 5.25 5.33 4.84 3.89 2.54
2.6 -1.33 1.86 3.87 4.98 5.39 5.21 4.55 3.47
2.8 -2.86 0.69 3.04 4.48 5.19 5.30 4.92 4.11
3 -4.54 -0.65 2.03 3.77 4.77 5.16 5.04 4.49
3.2 -6.35 -2.14 0.84 2.88 4.16 4.81 4.95 4.64
3.4 -8.28 -3.76 -0.49 1.82 3.37 4.28 4.66 4.58

The grid can then be refined to a narrower range of values of the parameters,

α
1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95
β 2.4 5.245 5.325 5.363 5.363 5.326 5.254 5.148 5.010
2.45 5.204 5.305 5.364 5.384 5.367 5.315 5.229 5.110
2.5 5.146 5.267 5.347 5.388 5.391 5.358 5.291 5.192
2.55 5.072 5.214 5.314 5.374 5.397 5.383 5.336 5.256
2.6 4.983 5.145 5.265 5.345 5.386 5.392 5.364 5.302
2.65 4.879 5.060 5.200 5.299 5.360 5.385 5.375 5.332
2.7 4.760 4.961 5.120 5.239 5.318 5.362 5.370 5.345
2.75 4.627 4.848 5.026 5.163 5.262 5.324 5.350 5.343

A still finer grid is shown below.

α
1.77 1.78 1.79 1.8 1.81 1.82 1.83 1.84
β 2.52 5.39297 5.39512 5.39582 5.39507 5.39289 5.38929 5.38429 5.37790
2.53 5.39186 5.39480 5.39627 5.39631 5.39490 5.39208 5.38786 5.38224
2.54 5.39009 5.39381 5.39606 5.39687 5.39625 5.39420 5.39075 5.38590
2.55 5.38765 5.39215 5.39519 5.39677 5.39692 5.39565 5.39297 5.38889
2.56 5.38456 5.38984 5.39365 5.39601 5.39693 5.39643 5.39452 5.39121
2.57 5.38082 5.38687 5.39146 5.39459 5.39629 5.39656 5.39541 5.39287
2.58 5.37643 5.38326 5.38862 5.39252 5.39499 5.39602 5.39564 5.39386
2.59 5.37140 5.37900 5.38513 5.38981 5.39304 5.39484 5.39522 5.39420

From this, we can say that the maximum likelihood estimates are approximately

\[ \hat{\alpha} = 1.81 \spaced{and} \hat{\beta} = 2.56 \]