A general framework
You may find it difficult to spot the
common theme in the examples in this section, but they are all examples of hypothesis
testing and fit into a common framework that is used for all hypothesis
tests.
Data, model and question
- Data (and model)
- The data were assumed to arise from a random
mechanism (model), some of whose characteristics are unknown.
- Null hypothesis
- This is a statement about the characteristics of the model. We are particularly
interested in whether the null hypothesis is true.
- Alternative hypothesis
- This is usually just the opposite of the null hypothesis.
We assess whether the null hypothesis is true by asking ...
Are the data consistent with the null hypothesis?
- Test statistic
- This is some function of the data that throws light on whether the null or alternative
hypothesis holds.
- P-value
- This is the probability of obtaining a test statistic value as 'extreme' as the
one recorded if the null hypothesis holds.
- Interpreting the p-value
- The following table can be used as a guide:
p-value |
Interpretation |
over 0.1 |
no evidence that the null hypothesis does not hold |
between 0.05 and 0.1 |
very weak evidence that the null hypothesis does not hold |
between 0.01 and 0.05 |
moderately strong evidence that the null hypothesis does not hold |
under 0.01 |
strong evidence that the null hypothesis does not hold |
Soccer league in one season
- Data (and model)
- End-of-season points for the teams in the league. The model
involves probabilities for wins, losses and draws in all matches, but we know little
about these probabilities.
- Null hypothesis
- All teams have the same probability of winning each match.
- Alternative hypothesis
- All teams do not have the
same probabilities of winning.
- Test statistic
- The standard deviation of final points. It will be low if the
teams have the same abilities (null hypothesis) and higher otherwise (alternative
hypothesis).
- P-value
- Probability that the standard deviation is as high as 16.7 (the actual data) if
all teams are equally matched. None of the simulated leagues had standard
deviations as high as 16.7, so the p-value is zero.
- Interpreting the p-value
- There is extremely strong evidence that the teams do not have
the same probability of winning.
Proportion
- Data (and model)
- Number of successes in 100 values. Our model assumes that P(success)
is the same for all values and that they were independent.
- Null hypothesis
- The probability of success is 0.80.
- Alternative hypothesis
- The probability of success is less than 0.80.
- Test statistic
- The number of successes in the 100 values. It will be near 80
if the underlying probability of success is 0.80 (null hypothesis) and lower than
80 if it is less (alternative hypothesis).
- P-value
- Probability of 72/100 or fewer successes (the actual data) if
the underlying population proportion is 0.80. The p-value was 0.05.
- Interpreting the p-value
- Since 72 or fewer successes
would be unlikely if the population proportion was 0.80, we have moderately strong
evidence that it is lower than 0.80.
Process mean
- Data (and model)
- Sample of 10 values from a process. Our model is that they were
sampled from a normally distribution with σ = 10 and unknown µ.
- Null hypothesis
- The distribution of values has mean µ = 520.
- Alternative hypothesis
- The alternative hypothesis is that µ ≠ 520 gm.
- Test statistic
- The mean of our sample of 10 values. It will be close to 520
if the process is working correctly (the null hypothesis) and farther from this if
the process mean has drifted from 520 (alternative hypothesis).
- P-value
- Probability of a sample mean as far from 520 as the value in our actual data
(529) if the underlying population mean is 520. Since none of the
simulated samples had means as far from 520, the p-value is 0.0.
- Interpreting the p-value
- Since a mean as far from 520 as our actual mean (529) is very unlikely, there
is strong evidence that the mean weight is no
longer 520
gm.
Comparison of groups
- Data (and model)
- Samples of values from two groups. No assumptions are made about
the shape of the underlying distributions (model) but the data are assumed to be
random samples from them.
- Null hypothesis
- The population distributions are the same in both groups.
- Alternative hypothesis
- The groups have different distributions.
- Test statistic
- The difference between the means of the two groups. It should
be near zero if the two populations are the same (null hypothesis) and only different
from zero if the groups are different (alternative hypothesis).
- P-value
- Probability that the difference in means is further from zero than 0.902 (the
actual data) if both samples come from the same population. Since
this never happened in our randomised samples, the p-value is zero.
- Interpreting the p-value
- We conclude that it is almost certain that
the null hypothesis does not hold —values are higher in Group A than in Group B.
Correlation coefficient
- Data (and model)
- Pairs of values from two variables (Y and X). No
assumptions are made about a model underlying the data.
- Null hypothesis
- The correlation coefficient between Y and X in the population
is zero.
- Alternative hypothesis
- The correlation is non-zero.
- Test statistic
- The sample correlation coefficient between Y and X. It
will be close to zero if the variables are uncorrelated in the underlying population
(null hypothesis) and further from zero otherwise (alternative hypothesis).
- P-value
- Probability that the correlation coefficient is further from zero than 0.537
(the actual data) if the variables are uncorrelated in the underlying population.
Only 3.5% of the simulated samples had relationships as strong, so this is the p-value.
- Interpreting the p-value
- A correlation coefficient as far
from zero would be unlikely if the null hypothesis was true, so there is moderately
strong evidence that the variables are correlated.