Relationship between variables

In most data sets, we are interested in the relationship between the variables. This is most obviously the case when the data consist of two numerical variables (correlation and least squares give information about the relationship) or two categorical variables (a contingency table and conditional proportions help show the relationship).

Groups and relationships

Questions about differences between two or more groups can also be expressed in terms of relationships between variables if group membership is represented by a categorical variable. For example, if we have collected data about incomes from a sample of males and a sample of females, we would be interested in comparing these two groups — i.e. the relationship between income and gender.

Does application of a surface coating affect the hardness of a plastic?
Is there a relationship between the coating and hardness?
Are children from large families more or less likely to go to university?
Is there a relationship between number of siblings and attendance at university?
Which of three different varieties of corn has greatest yield?
Is there a relationship between corn variety and yield?
Do boys aged 14 perform better than girls at maths?
Is there a relationship between gender and mark in a maths test?

Interpreting relationships

Relationships can be much harder to interpret than you might think

In some situations, the relationship between two variables, such as the relationship evident in a scatterplot, may not describe a meaningful 'real' relationship.