Estimating unknown parameters from data

The only way to get information about unknown parameters is by collecting data whose distribution depends on these parameters, then using a function of the data to estimate each them.

We initially concentrate on models involving a single unknown parameter that we will call \(\theta\). The data will be assumed to be a random sample from a distribution whose shape depends on \(\theta\).

Definition

If \({X_1, X_2, \dots, X_n}\) is a random sample from a distribution whose shape depends on an unknown parameter \(\theta\), then any function of the random sample,

\[ \hat{\theta}(X_1, X_2, \dots, X_n) \]

is a random variable and could potentially be used as an estimator of \(\theta\).

A few examples of possible estimators are:

Sample mean
\[\hat{\theta} = \frac {\sum_{i=1}^n {X_i}} n\]
Sample median
\[\hat{\theta} = median(X_1, X_2, \dots, X_n) \]
Sample maximum
\[\hat{\theta} = max(X_1, X_2, \dots, X_n) \]

Since there are various possible functions of the data that might be used as estimators, it is important to consider what makes a good estimator of any parameter. The rest of this section will consider the characteristics of good estimators.

Sex ratio of Siberian tigers

Data were collected about the sexes of Siberian tigers in litters of different sizes at birth. The table below describes the number of males in each litter of size three.

Number of males 0 1 2 3
Frequency 33 66 80 28

If we assume that the probability of each birth being male is \(\pi\) and that the sex of each birth is independent of the sex of other births, the number of males in a litter, \(X\), will have a binomial distribution,

\[ X \;\; \sim \; \; \BinomDistn(n=3,\;\; \pi) \]

Although the probability of any birth being a male, \(\pi\), will be close to a half, it is well recognised that the probability of a male birth in humans is slightly greater than 0.5 so we will also treat \(\pi\) as an unknown parameter for tiger births.

The above frequency table summarises a random sample of 207 values from this distribution. One possible estimator of \(\pi\) is the average number of males in these litters, divided by the litter size, 3.

\[ \hat{\pi}(X_1, X_2, \dots, X_n) = \frac {\overline {X}} 3 \]

From this data set, we get the estimate \(\hat{\pi} = 0.499\). But is this the best possible estimate of \(\pi\)?