Long page
descriptions

Chapter 2   Multiple Regression

2.1   Least squares for Y vs (X and Z)

2.1.1   More than one explanatory variable

In many data sets, two or more explanatory variables could potentially affect the response. Using two or more explanatory variables may give more accurate predictions.

2.1.2   Three-dimensional scatterplots

Data sets with two explanatory variables and a response can be effectively displayed in a rotating 3-dimensional scatterplot.

2.1.3   Linear equation and least squares plane

The simple linear model can be extended by adding another linear term involving a second explanatory variable. This equation represents a plane in 3-dimensions.

2.1.4   Understanding the parameters

The intercept is the predicted y-value when x and z are zero. The two slope parameters give the predicted change in y when x and z increase by one.

2.1.5   Fitted values and residuals

The linear model provides predictions at all x- and z-values. The prediction for the x- and z-value corresponding to the i'th data point is its fitted value and the difference between this and the recorded y-value is its residual.

2.1.6   Estimating the parameters

Parameter estimates that result in small residuals are good.

2.1.7   Least squares estimation

An objective estimation method is to minimise the sum of squared residuals -- the principle of least squares.

2.1.8   Interpreting the coefficients

The slope coefficients give the predicted effect of changes to one variable, but only when the other variable remains the same.

2.2   Normal linear model and inference

2.2.1   Normal linear model

Randomness can be modelled by assuming that the response has a normal distribution whose mean is a linear function of the explanatory variables and whose standard deviation is constant.

2.2.2   Sampling variability of least squares plane

The normal linear model also implies that the least squares plane varies from sample to sample.

2.2.3   Distribution of estimated coefficients

The least squares estimate of each coefficient has a normal distribution whose mean is the underlying population parameter.

2.2.4   Estimate of error standard deviation

The error variance is estimated by the sum of squared residuals divided by (n-3). The best estimate of the error standard deviation is the square root of this.

2.2.5   95% confidence intervals for coefficients

The standard deviation of each parameter estimate depends on the error standard deviation. Replacing this with an estimate allows us to find a 95% confidence interval.

2.2.6   Hypothesis tests for coefficients

A t test-statistic can be found by dividing a parameter estimate by its standard error. The p-value for testing whether the parameter is zero is the tail-area of a t distribution with (n-3) degrees of freedom.

2.3   The general linear model

2.3.1   General linear model

The linear models with one and two explanatory variables can be generalised to include p explanatory variables. The parameters can be estimated by least squares.

2.3.2   Describing the simple linear model with matrices

A normal linear model with a single explanatory variable can be expressed in a matrix equation.

2.3.3   General linear model with matrices

When the linear model is generalised to allow any number of explanatory variables, a similar matrix equation describes the model.

2.3.4   Least squares with matrices ((advanced))

A simple matrix equation provides the least squares estimates of all parameters of the general linear model.

2.3.5   Interpreting coefficients

The slope coefficient associated with an explanatory variable describes its effect if all other variables are held constant. It may have a different sign from the correlation coefficient between the variable and the response.

2.3.6   Standard errors

The error standard deviation can be estimated from the residual sum of squares. A simple matrix equation uses this estimate to find the standard errors of the least squares estimates.

2.3.7   Inference for general linear models

95% confidence intervals can be found from the parameter estimates and their standard errors. The significance of the individual parameters can also be tested, but each such test assumes that all other variables are retained in the model.

2.4   Nonlinear relationships

2.4.1   Linear models for curvature

A general linear model is linear in its parameters, but not necessarily in the explanatory variables. Models with transformed variables and with quadratic terms are all general linear models.

2.4.2   Linearity of quadratic models ((optional))

A model with a linear term and a quadratic term in x is still linear in the parameters and is a general linear model.

2.4.3   Polynomial models

Polynomial models have terms involving various powers of x and are flexible ways to model curvature. As the order of the polynomial increases, the curve can become less smooth. Polynomials are usually poor for extrapolation.

2.4.4   Residual plots to detect nonlinearity

For detecting curvature when there is more than one explanatory variable, it is better to plot residuals rather than the raw data.

2.4.5   Partial residual plots

If a plot of residuals against X shows curvature, a partial residual plot can give an indication of which nonlinear function of X to use in the model.

2.4.6   Model with quadratic in X, linear in Z

If the response in related linearly to Z but nonlinearly to X, a quadratic term in X can be added to the model to explain the curvature. The resulting model corresponds to a curved surface in a 3-dimensional scatterplot.

2.4.7   Model with quadratic terms in X and Z

Quadratic terms in both X and Z can be added, resulting in a surface that is curved in both X and Z directions.

2.4.8   Visualising least squares

The residuals from a quadratic model can be represented as vertical lines from data points to the quadratic surface. If squares are drawn for each residual, least squares means minimising the total area of these squares.

2.4.9   Tests for curvature in X and Z

Curvature can be assessed with t-test about whether the two quadratic parameters are non-zero.

2.5   Interaction

2.5.1   Additivity of effects of X and Z

In the models in previous sections, the effect of X on Y was the same for all values of Z and similarly the effect of Z on Y was the same, whatever the value of X.

2.5.2   Interaction between X and Z

Interaction between X and Z occurs when the effect on Y of increasing X is different for different values of Z. Adding a term in XZ to the model may explain the interaction.

2.5.3   Inference for models with interaction

A t-test for whether the coefficient of XZ is zero provides a simple test for interaction.

2.5.4   Tranformations and interaction

The existence and amount of interaction is affected by nonlinear transformations of the response. Sometimes analysing the log response can remove interaction, making the results easier to interpret.

2.5.5   Example (nonlinearity and interaction)

In this page, a data set that has both curvature and interaction is analysed.

2.6   Diagnostics with 2 explanatory variables

2.6.1   Problem points

Problems with the multiple regression model may relate to all data points, but sometimes only one or two data points cause problems.

2.6.2   Leverage

Data points have high leverage if their values for the explanatory variables are 'unusual'.

2.6.3   Problems with high leverage

Because high leverage points pull the least squares plane strongly, their residuals are rarely large, even if they are outliers.

2.6.4   Standardised residuals

Standardising the residuals adjusts for the lower residual standard deviation of high leverage points.

2.6.5   Externally studentised residuals

Ordinary standardised residuals often fail to highlight outliers that are high leverage points. Standardising with a deleted estimate of the error variance is best for detecting outliers.

2.6.6   Influence

Leverage depends only on the explanatory variables and describes the potential of a point to influence the results. DFITS and Cook's D describe the actual influence of each point.

2.6.7   Examples

The externally studentised residuals, leverages and DFITS provide a good guide to problems with individual points. Several examples are given.