A third numerical variable can be represented in a scatterplot by use of different symbols or colours.
Three numerical variables can be displayed in a 3-dimensional scatterplot; this may be rotated to help understand the relationships in the data.
An array of scatterplots of all pairs of variables is often informative, especially if the scatterplots are dynamically linked.
'Brushing' refers to dynamic highlighting of the same individuals in multiple linked displays.
Slicing is another dynamic technique. Only observations within a range of values of one variable (a slice) are displayed in linked displays.
Correlation and least squares are used to describe the relationship between two numerical variables. Additional measurements from each individual can potentially help to refine our understanding of the relationship.
Different symbols or colours can be used to represent a third categorical variable in a scatterplot.
The relationship between Y and X can be separately described by a least squares line within each group. This should lead to improved prediction of the response if the relationship is different in different groups.
If regression lines for the different groups are parallel, it is easy to summarise the group differences numerically and interpret these differences.
Transformations may linearise the relationship between the response and explanatory variables in each group and also give parallel regression lines.
A numerical variable can be used to split the individuals into groups.
Groups can also be represented with different symbols or colours on a scatterplot matrix that describes the relationships between 3 or more other variables.
In many data sets, two or more explanatory variables could potentially affect the response. Using two or more explanatory variables may give more accurate predictions.
A simple linear model with a single explanatory variable can be extended with extra terms to explain the additional effect of other explanatory variables.
The slope coefficient associated with an explanatory variable describes its effect if all other variables are held constant. It may have a different sign from the correlation coefficient between the variable and the response.
The relationship between a response variable and two explanatory variables can be effectively displayed in a rotating 3-dimensional scatterplot.
The equation of a linear model for Y in terms of X and Z can be displayed as a plane in 3-dimensions.
The residuals are vertical distances from the crosses on a 3-dimensional scatterplot to the plane representing the model.
An objective estimation method is to minimise the sum of squared residuals -- the principle of least squares.
Variables Y and X may be positively correlated overall, but have zero or even negative correlation at each value of a categorical variable, Z. The variable Z is called a lurking (or hidden) variable.
A lurking variable can also distort the difference between the means of Y in two groups (i.e. for two values of a categorical variable, X).
If X, Y and Z are all categorical, a reversal of the marginal relationship between X and Y and their conditional relationships for different values of Z is called Simpson's paradox.
A few extra examples are shown where a hidden variable, Z, can result in a misleading conclusion from the marginal relationship. A full analysis using Z is always more complex but is essential to understand the relationship.