Causal relationships
In many bivariate data sets, the relationship between the two variables is not symmetric. From the nature of the variables and the way that the data were collected, it may be clear that one variable, X, can potentially influence the other, Y, but that the opposite is impossible.
In such data, the variable X is called the explanatory variable and Y is called the response.
Experiments
An experiment is a type of data collection in which the person conducting the experiment controls the values of the explanatory variable — for example, adjusting the temperature for each run of the experiment in the boxed example below. A well-designed experiment always ensures that the relationship between the explanatory variable and response is causal.
CO production and temperature
If we measure the amount of carbon monoxide produced (Y) when a reaction is conducted at a variety of temperatures (X), then CO production may depend on temperature, but temperature is only determined by the choice of the experimenter.
Observational studies
Many data sets are not obtained from experiments. If the person collecting the data has no control over either of the variables, and simply records a pair of values from each individual, then the data are called observational. If there is a time-ordering of two variables in an observational study — one variable is an earlier measurement than the other — then we may also be able to treat the relationship as causal with the later variable being the response.
Even if the relationship is not causal, we are sometimes interested in predicting the value of one variable from the other. In this situation, we would analyse the data with the variable being predicted treated as the response.
Heights of fathers and sons
In a data set that contains the heights of fathers and their sons at age 18, we can argue that the fathers' heights, X, may influence the heights of their sons, Y, through genetical inheritance, but that the influence cannot be in the opposite direction.
Body fat and skinfold thickness
Accurate measurements of body fat, Y, are difficult to make, whereas measurements of skinfold thickness, X, are relatively easily found. It would be useful to be able to predict body fat from skinfold thickness, based on a dataset with both measurements from a group of people. We might then treat body fat as a response and skinfold thickness as the explanatory variable in an analysis of the data.
Before analysing bivariate data,
Always consider whether one variable should be treated as a response.
Some statistical methods can only be used when such a classification of the variables has been done.