In observational studies such as surveys, data are sampled from populations to describe the characteristics of the populations.
Causal relationships between variables are particularly important since they predict the effect of changing one variable on the other.
Causal relationships cannot be inferred from observational data since observed relationships may result from the influence of further unrecorded variables.
In experiments, the values of one variable are under the control of the researcher.
A well-designed experiment gives the researcher information about how changes to the controlled variable affect the response.
The problem being addressed by a project and its specific objectives should be clearly stated before data are collected. The researcher should also consider how the project results will be used.
Depending on the objectives of a project, it may be appropriate to conduct an experiment, a survey or both.
In experiments and surveys, data are collected from discrete units. Decisions about which measurements should be made from each unit must be guided by the objectives of the study.
When experiments and surveys are conducted with people rather than other types of unit, various practical issues complicate the design.
Experiments are conducted to assess the effect of some categorical or numerical explanatory variable on a response. Experiments are characterised by the fact that the researcher can control the values of the explanatory variable that are used.
In many experiments, the units on which the experiment is conducted are not identical. The varying characteristics of the experimental units may affect the response.
If the treatments are badly allocated to experimental units, the experiment may over- or under-estimate their effect.
If the experimental units that are chosen to get one treatment also differ in other ways, it will be impossible to tell whether it is the treatment or the other distinct characteristics of the units that affects the response.
Random allocation of treatments to experimental units avoids systematic over- or under-estimation of treatment effects.
If each treatment is used in more than one experimental unit, unit-to-unit variability can be assessed. This gives information about whether differences between the treatments are more than chance differences.
By grouping similar experimental units into blocks and randomly allocating treatments within blocks, the treatment effects can be estimated more accurately.
Varying the values of the controlled factors causes some variability in the response -- explained variation. In some kinds of experiment, there is known structure to the experimental units that also results in explained variation. Other variation remains unexplained.
Explained variation in the response is modelled with a function for the response mean that involves the levels of the factors (constants) and some unknown parameters. Unexplained variation is modelled with a probability distribution.
In normal models, the response is the sum of a component that depends on on the experimental factors and structure of the experimental units, and an error term that a normal distribution with mean zero and constant standard deviation.
The model for the explained variation usually involves unknown parameters. These can be estimated by least squares.