Inference and random samples

The examples in the previous section involved a range of different types of model for the observed data. In the remainder of this chapter, we concentrate on one particular type of model — random sampling from a population.

We assume now that the observed data are a random sample from some population.

When the observed data are a random sample, inference asks questions about characteristics of the underlying population distribution — unknown population parameters.

For random samples, the null and alternative hypotheses specify values for the unknown population parameters.

Inference about categorical populations

When the population distribution is categorical, the unknowns are the population probabilities for the different categories. To simplify, we consider populations for which one category is of particular interest ('success') and we denote the unknown probability of success by π.

The null and alternative hypotheses are therefore specified in terms of π.

Scaring whales from fishing boats

Researchers try to assess whether the sound of a killer whale helps to scare away whales from fishing boats. A proportion 0.4 normally leave immediately the boat arrives, but 15 out of 30 leave when the sound it transmitted.

We model the whales' reactions as a random sample of 30 categorical values from a population with probability π of success (leaving). The null hypothesis of interest is therefore...

H0:   π = 0.4

The alternative hypothesis is

HA:   π > 0.4

Rat recognition of symbols

Can rats can distinguish between three different symbols? In an experiment, three food containers marked with a circle, square and cross are placed in a cage. A card marked with one of these symbols is shown to the rat and the corresponding box is unlocked, allowing access to food.

A rat is initially trained with 100 randomly selected cards (and corresponding unlocked containers). The experiment is then repeated a further 90 times and the researcher notes the number of times the rat goes to the correct container. Out of 90 repetitions, the rat chooses the correct container 36 times.

This situation can be modelled as random sampling of 90 values (correct or wrong) from a categorical population in which the probability of choosing the correct container is π. The null hypothesis of interest is therefore...

H0:   π = 1/3       (guessing)

The alternative hypothesis is

HA:   π > 1/3       (learned)

Tests about parameters of other populations

Other data sets arise as random samples from different kinds of population. For example, numerical data sets are often modelled as random samples from a normal distribution. Again, the hypotheses of interest are usually expressed in terms of the parameters of this distribution.

For example, to test whether the mean of a normal distribution is zero, the hypotheses would be...

H0:   µ = 0

HA:   µ ≠ 0

In the remainder of this section, we show how to test a population probability, and in the next section we will describe tests about a population mean.