Causal relationships
In many bivariate data sets, the relationship between the two variables is not symmetric. From the nature of the variables and the way that the data were collected, it may be clear that one variable, X, can potentially influence the other, Y, but that the opposite is impossible.
In such data, the variable X is called the explanatory variable and Y is called the response.
Experiments
An experiment is a type of data collection in which the person conducting the experiment controls the values of the explanatory variable — for example, adjusting the machine speed for each run of the experiment in the boxed example below. A well-designed experiment always ensures that the relationship between the explanatory variable and response is causal.
Packets of corn flakes
A quality control team may adjust the speed of a machine filling packets of corn flakes and make a numerical measurement of the quality of the output. Operating speed may affect quality but the quality of the output cannot influence the speed of the machine.
Observational studies
Many data sets are not obtained from experiments. If the person collecting the data has no control over either of the variables, and simply records a pair of values from each individual, then the data are called observational. If there is a time-ordering of two variables in an observational study — one variable is an earlier measurement than the other — then we may also be able to treat the relationship as causal with the later variable being the response.
Even if the relationship is not causal, we are sometimes interested in predicting the value of one variable from the other. In this situation, we would analyse the data with the variable being predicted treated as the response.
Temperature and productivity
A researcher who is investigating how the office environment affects productivity may measure the temperature in a large accounts office each day for a month (X) and record the number of invoices that the office processes each day (Y). Temperature may affect productivity, but the opposite is impossible.
Body fat and skinfold thickness
Accurate measurements of body fat, Y, are difficult to make, whereas measurements of skinfold thickness, X, are relatively easily found. It would be useful to be able to predict body fat from skinfold thickness, based on a dataset with both measurements from a group of people. We might then treat body fat as a response and skinfold thickness as the explanatory variable in an analysis of the data.
Before analysing bivariate data,
Always consider whether one variable should be treated as a response.
Some statistical methods can only be used when such a classification of the variables has been done.