Association

As noted in the previous page, researchers are usually interested in relationships between variables. When two variables are related, we say that there is association between them.

For example, consider the height, X, and weight, Y, of a sample of school children. Tall children tend to be heavier, so high values of X are associated with high values of Y. The correlation coefficient describes the amount of linear association between two such numerical variables.

Causal relationships

In some data sets, it is possible to conclude that one variable has a direct influence on the other. This is called a causal relationship.

For example, ...

If two variables are causally related, it is possible to conclude that changes to the explanatory variable, X, will have a direct impact on Y.

Non-causal relationships

Not all relationships are causal. In non-causal relationships, the relationship that is evident between the two variables is not completely the result of one variable directly affecting the other. In the most extreme case, ...

Two variables can be related to each other without either variable directly affecting the values of the other.

The two diagrams below illustrate mechanisms that result in non-causal relationships between X and Y.

If two variables are not causally related, it is impossible to tell whether changes to one variable, X, will result in changes to the other variable, Y.

For example, the scatterplot below shows data from a sample of towns in a region.

The positive correlation between the number of churches and the number of deaths from cancer is an example of a non-causal relationship — the size of the towns is a lurking variable since larger towns have more churches and also more deaths. Clearly decreasing the number of churches in a town will not reduce the number of deaths from cancer!

Researchers usually want to detect causal relationships