Groups and explanatory variables
It was explained earlier that data from different groups can be combined in a single data matrix with a categorical variable that gives group membership. In a similar way, a categorical variable can be used to split a data set into groups.
In some data sets, one categorical variable can be thought of as a response whose values are thought to depend on a second categorical variable — an explanatory variable. We can then think of the explanatory variable as defining different groups and ask how the response distribution differs between the groups.
Do not use the response variable to define the groups.
If one categorical variable is a response and the other is an explanatory variable, the methods in the previous section can be used to see how the explanatory variable affects the response.
Bipolar disorder and family history
Bivariate data without an explanatory variable
Not all data sets have variables that can be categorised as a response and an explanatory variable. Sometimes the relationship between the variables is more symmetrical but we still want to discover whether particular values of one variable are associated with values of the other.
For numerical variables, we would use a correlation coefficient to describe the strength of the relationship (as opposed to least squares for variables that can be classified as a response and explanatory variable). When the two variables are categorical, different methods are needed to describe the association between the variables.
The remainder of this section describes some methods of analysing data of this form.
Alcohol and nicotine intake