A general framework
The examples in earlier pages of this section involved different types of data
and different analyses. Indeed, you may find it difficult to spot their common
theme!
All analyses were examples of hypothesis testing. We now describe
the general framework of hypothesis testing within which all of these examples
fit. This general framework is the basis for important applications in later sections
of CAST.
The concepts in this page are extremely important — make sure that
you understand them well before moving on.
Data, model and question
- Data (and model)
- Each example dealt with a data set that was assumed to arise from some random
mechanism. We may be able to specify some aspects of this random mechanism (model),
but it also has unknown characteristics
- Null hypothesis
- All models had unknown characteristics, and we want to know whether the model
has particular properties — the null hypothesis.
- Alternative hypothesis
- If the null hypothesis is not true, we say that the alternative hypothesis holds. (You can understand most of hypothesis testing without paying much attention
to the alternative hypothesis however!)
Either the null hypothesis or the alternative hypothesis must be true.
Approach
We assess whether the null hypothesis is true by asking ...
Are the data consistent with the null hypothesis?
It is extremely important that you understand that hypothesis
testing addresses this question — make sure that you remember it well!!
Answering the question
- Test statistic
- This is some function of the data that throws light on whether the null or
alternative hypothesis holds.
- P-value
- Testing whether the data are consistent with the null hypothesis is based on the probability of obtaining a test statistic value as 'extreme'
as the one recorded if the null hypothesis holds. This is called the p-value for the test.
- Interpreting the p-value
- Although it may be regarded as an over-simplification, the table below can
be used as a guide to interpreting p-values.
p-value |
Interpretation |
over 0.1 |
no evidence that the null hypothesis does not hold |
between 0.05 and 0.1 |
very weak evidence that the null hypothesis does not hold |
between 0.01 and 0.05 |
moderately strong evidence that the null hypothesis does not hold |
under 0.01 |
strong evidence that the null hypothesis does not hold |
Use the pop-up menu below to check how the earlier examples in this section fit
into the hypothesis testing framework.
Soccer league in one season
- Data (and model)
- Some random mechanism underlies the actual results in the matches during a
season. The probabilities of winning may vary from team to team and there may
be a home-team advantage, so there are a lot of unknowns about this model! Our
data are a single set of results — the league table at the end of the season.
- Null hypothesis
- The null hypothesis is that all teams are equally matched — i.e. that they
all have the same probability of winning each match.
- Alternative hypothesis
- The alternative hypothesis is that all teams do not have the
same probabilities of winning.
- Test statistic
- The standard deviation of final points is used. It will be low if the teams
have the same abilities (null hypothesis) and higher otherwise (alternative hypothesis).
- P-value
- We simulated the soccer league, assuming that all teams had the same probability
of winning. The p-value was the probability of getting a standard deviation of
final points as high as 19.3 (the actual data).
- Interpreting the p-value
- The p-value was 0.000 (or close). Since there is virtually no chance of getting a standard
deviation of points as high as that in the actual league from equally matched teams, we conclude
that the teams are not equally matched — the null hypothesis is false.
Weapon detection at LAX
- Data (and model)
- Each weapon has some probability of being detected — our model. We also assume
that detection of different weapons is independent, a reasonable assumption if different
FAA agents are used over a reasonable period of time, but not if the same agent repeatedly
tries to carry a weapon on board. A sample of 100 weapons was used and our data is
the number that were detected.
- Null hypothesis
- The null hypothesis is that each weapon has probability 0.80 of being detected
— the national rate.
- Alternative hypothesis
- The alternative hypothesis is that the LAX detection rate is lower than 0.80.
- Test statistic
- The number of weapons detected is the test statistic. It will be near 80 if the
underlying probability of success is 0.80 (null hypothesis) and lower than 80 if
it is less (alternative hypothesis).
- P-value
- We simulated carrying 100 weapons onto planes, assuming that each had probability
0.80 of being detected. The p-value was the probability of 72/100 or
fewer being detected (the actual data).
- Interpreting the p-value
- The p-value was around 0.04. This means that getting as few as 72 weapons detected
would be unlikely if the LAX detection rate was 0.80, giving moderately strong evidence
that the LAX detection rate was lower than 0.80 — i.e. moderately strong evidence
that the null hypothesis is not true.
Net weight of corn flake packets
- Data (and model)
- We are told that the net weight of corn flake packets is normally distributed
with a standard deviation of σ = 10 gm and unknown mean, µ. The data
are a random sample from this distribution.
- Null hypothesis
- The null hypothesis is that the weights of packets have a distribution with mean
µ = 520 gm.
- Alternative hypothesis
- The alternative hypothesis is that µ ≠ 520 gm.
- Test statistic
- The mean weight of our sample of 10 packets is the test statistic. It will be
close to 520 gm if the filling machine is working correctly (the null hypothesis)
and will be far from this if the mean filling weight has drifted from 520 gm
(the alternative hypothesis).
- P-value
- We simulated samples of n = 10 values from a normal (µ = 520, σ = 10)
distribution. The p-value was the probability of getting a sample mean as far from
520 as the value in our actual data (529).
- Interpreting the p-value
- A p-value of around 0.01 means that a sample mean weight as far from 520 gm as
the one we recorded would be very unlikely if the null hypothesis was true. There
is strong evidence that the mean weight is no longer 520 gm.
Characteristics of failed companies
- Data (and model)
- We assume that our data (asset-to-liability ratios from companies) are random
samples of 68 and 33 from underlying populations of healthy and failed companies
(respectively) in Greece. Random sampling from these unknown populations is our model.
- Null hypothesis
- The null hypothesis is that the population distribution of asset-to-liabilities
ratio is the same for healthy and failed companies.
- Alternative hypothesis
- The alternative hypothesis is that the distributions are different for healthy
and failed companies.
- Test statistic
- The difference between the mean asset-to-liabilities ratios of the healthy and
failed companies was the test statistic. It should be close to zero if the null hypothesis
holds, and far from zero if the alternative hypothesis holds.
- P-value
- We randomised the 101 asset-to-liabilities ratios between the two groups of companies
— each is equally likely if the populations are the same. The p-value was the probability
that the difference in means is further from zero than 0.902 (the actual data).
- Interpreting the p-value
- From a p-value of 0.000 (or similar), we conclude that it is almost certain that
the null hypothesis does not hold — asset-to-liabilities ratios are higher for healthy
companies.
Soccer leagues in two seasons
- Data (and model)
- Some random mechanism again generates results during each season — our model.
In this case, we have recorded the number of points for each team at the end of two
successive seasons — our data.
- Null hypothesis
- The null hypothesis is that all teams (other than Manchester United, Arsenal,
Liverpool and Leeds) have the same probabilities of winning against the same opposition
in both seasons.
- Alternative hypothesis
- The alternative hypothesis is that the chances of teams winning in 2013/14 are related to their
performance in 2012/13.
- Test statistic
- The correlation coefficient between the final points of teams in the two seasons
is the test statistic. It will be close to zero if the teams have the same chances
of winning (the null hypothesis) and will be positive if some teams have higher chances
of winning than others (the alternative hypothesis).
- P-value
- We randomised the points in the second season — each is equally likely if the
teams have the same probability of winning. The p-value was the probability that
the correlation coefficient is further from zero than 0.798 (the actual data).
- Interpreting the p-value
- From a p-value that is close to zero, we conclude that a correlation coefficient as far
from zero would be unlikely if the null hypothesis was true, so there is
strong evidence that the teams are not evenly matched.