Estimation and hypothesis testing are different branches of inference about the parameters in statistical models.
Hypothesis tests compare two claims about the model parameters, the null and alternative hypotheses.
A test statistic is a function of the data whose value is likely to be more "extreme" in some sense when the alternative hypothesis is true than when the null hypothesis is true. Its distribution should also be fully known when the null hypothesis holds.
The p-value for a test is the probability of as "extreme" a test statistic as the one that was observed, when the null hypothesis holds.
When values of the test statistic at both ends of its distribution favour the alternative hypothesis, the test is called two-tailed. The p-value is double the smaller tail probability.
When the null hypothesis holds, the p-value has a rectangular distribution between 0 and 1. When the alternative hypothesis is true, the p-value's distribution has greater probability of being near 0.
A sum of squared standardised Poisson variables has approximately a chi-squared distribution.
The chi-squared distribution can be used to test whether a random sample comes from a Poisson distribution with some pre-specified value of the parameter, λ.
If the value of the Poisson parameter, λ, is estimated from the data, the chi-squared test can still be conducted, but its degrees of freedom must be reduced by one.
If the Poisson counts are small, the earlier chi-squared approximation for the test statistic does not hold well. If the data are first summarised in a frequency table, the test can be applied to the cell counts instead.
The chi-squared goodnes-of-fit test can be applied to frequency tables to assess whether data are random samples from other discrete distributions.
The chi-squared goodness-of-fit test can also be used for continuous data if the data are first summarised in a frequency table.
When a normal distribution's variance is a known value, the sample mean (or a standardised version) can be used as a test statistic since it has a normal distribution whose parameters are known when the null hypothesis is true.
If σ must be estimated from the data, the corresponding test statistic should be referred to a t distribution to get the p-value for tests about μ.
Tests about σ² are based on the sample variance and a chi-squared distribution is used to get the p-value.
If two samples come from normal distributions with equal variances, the test for comparing their means is based on a pooled estimate of the variance. The test statistic has a t distribution.
The ratio of two sample variances has an F distribution when the sample distributions' variances are equal. A test for equal variances can be based on this.
Hypothesis tests are often based on interpreting a p-value. An alternative approach involves a decision that is made about which of the two hypotheses is true. The significance level is the probability of wrongly concluding that the null hypothesis is not true.
Two types of error can be made from a test; deciding that H₀ is true when it is false, and deciding it is false when it is true.
A test with significance level α can be based on the test's p-value. The null hypothesis is rejected if the p-value is less than α.
For discrete data, it is usually impossible to find a decision rule with significance level exactly 5% (or any other pre-specified value). A conservative test uses a significance level that is just under the required value.
The power of a test is the probability of accepting H₀ when it is false. Since this usually depends on the actual parameter value, it is a function that can be graphed.
If a hypothesis test's significance level and its power at some value of the parameter are specified, the sample size can be determined to achieve this.
We often want to compare two models for data where one model (the small model) is a special case of the other (the big model).
The ratio of the maximum possible likelihoods under the big and small models gives information about whether the small model fits adequately.
The likelihood ratio test is based on twice the log of the likelihood ratio. An approximate p-vaue can be found from a chi-squared distribution.
This example uses the likelihood ratio test to assess whether the rate parameters of two exponential distributions are equal.
The concepts of test statistics and pivots are closely related. Hypothesis tests and confidence intervals based on them are therefore also closely related.
An example shows how a hypothesis test about a binomial parameter with 5% significance level can be obtained from a 95% confidence interval.
Another example derives an exact confidence interval for a binomial probability by inverting a hypothesis test whose p-value is found exactly using the binomial distribution.
A confidence interval for the median of a continuous distribution can be found by inverting a hypothesis test about it.
Finally, an example shows how inverting a likelihood ratio test can be used to find a confidence interval for an exponential parameter.