For most data sets, we are interested in understanding the relationships between the variables. However interpreting relationships must be done with care.
If the relationship between X and Y is causal, it is possible to predict the effect of changing the value of X.
Causality can only be deduced from how the data were collected — the data values themselves do not contain any information about causality.
In an observational study, values are passively recorded from individuals. Experiments are characterised by the experimenter's control over the values of one or more variables.
Causal relationships can only be deduced from well-designed experiments.
Experiments are conducted to assess the effect of some categorical or numerical explanatory variable on a response. Experiments are characterised by the fact that the researcher can control the values of the explanatory variable that are used.
In many experiments, the units on which the experiment is conducted are not identical. The varying characteristics of the experimental units may affect the response.
If the treatments are badly allocated to experimental units, the experiment may over- or under-estimate their effect.
If the experimental units that are chosen to get one treatment also differ in other ways, it will be impossible to tell whether it is the treatment or the other distinct characteristics of the units that affects the response.
Random allocation of treatments to experimental units avoids systematic over- or under-estimation of treatment effects.
If each treatment is used in more than one experimental unit, unit-to-unit variability can be assessed. This gives information about whether differences between the treatments are more than chance differences.
By grouping similar experimental units into blocks and randomly allocating treatments within blocks, the treatment effects can be estimated more accurately.
The effect of a factor is most accurately estimated in an experiment in which all experimental units are very similar. The more variability in the experimental units, the less accurate the estimates.
If the factor has two levels, it is sometimes possible to group the experimental units into pairs that are similar to each other. By allocating one experimental unit from each pair to each factor level, the factor effect is estimated more accurately than in a completely randomised experiment.
It is critically important that the two treatment levels are randomly allocated to the two experimental units in each pair.
The idea of matched pairs of experimental units can be generalised into matched groupes of 3 or more if the factor has 3 or more levels. Randomly allocating the factor levels within each matched group results in more accurate estimates than in a completely randomised design.
In many experiments, the experimental units naturally occur in groups of similar units (called blocks) where the block size is a multiple of the number of factor levels. A randomised block experiment randomly allocates equal numbers of units to each factor level within each block.
A second factor can be varied in a randomised experiment for one factor without affecting the accuracy of the estimated effect of the first factor.
In a factorial experiment, each combination of the levels of two or more factors is used on the same number of experimental units.
The simplest model for the effect of two factors assumes that the effect of each factor is the same, whatever the value of the other factor. The mean response can be written as the sum of the effects of the separate factors.
If the effect of one factor depends on the level of the other, there is said to be interaction between them. Interaction can be assessed from a factorial experiment.
This example helps to explain the concept of interaction.
The simplest model for three or more factors assumes that the effect of each does not depend on the levels of the others. The mean response is modelled as the sum of terms for the individual factors.
In factorial experiments with 3 or more factors, there may be interactions between two or more of the factors.
Why is the experiment being conducted and how will the results be used?
In an experiment, what experimental units will be used? What response variable will be recorded? Which variables will be controlled?
Other practical issues are involved when conducting experiments on human subjects.