In many data sets, two or more explanatory variables could potentially affect the response. Using two or more explanatory variables may give more accurate predictions.
Data sets with two explanatory variables and a response can be effectively displayed in a rotating 3-dimensional scatterplot.
The simple linear model can be extended by adding another linear term involving a second explanatory variable. This equation represents a plane in 3-dimensions.
The intercept is the predicted y-value when x and z are zero. The two slope parameters give the predicted change in y when x and z increase by one.
The linear model provides predictions at all x- and z-values. The prediction for the x- and z-value corresponding to the i'th data point is its fitted value and the difference between this and the recorded y-value is its residual.
Parameter estimates that result in small residuals are good.
An objective estimation method is to minimise the sum of squared residuals -- the principle of least squares.
The slope coefficients give the predicted effect of changes to one variable, but only when the other variable remains the same.
Randomness can be modelled by assuming that the response has a normal distribution whose mean is a linear function of the explanatory variables and whose standard deviation is constant.
The normal linear model also implies that the least squares plane varies from sample to sample.
The least squares estimate of each coefficient has a normal distribution whose mean is the underlying population parameter.
The error variance is estimated by the sum of squared residuals divided by (n-3). The best estimate of the error standard deviation is the square root of this.
The standard deviation of each parameter estimate depends on the error standard deviation. Replacing this with an estimate allows us to find a 95% confidence interval.
A t test-statistic can be found by dividing a parameter estimate by its standard error. The p-value for testing whether the parameter is zero is the tail-area of a t distribution with (n-3) degrees of freedom.
The linear models with one and two explanatory variables can be generalised to include p explanatory variables. The parameters can be estimated by least squares.
A normal linear model with a single explanatory variable can be expressed in a matrix equation.
When the linear model is generalised to allow any number of explanatory variables, a similar matrix equation describes the model.
A simple matrix equation provides the least squares estimates of all parameters of the general linear model.
The slope coefficient associated with an explanatory variable describes its effect if all other variables are held constant. It may have a different sign from the correlation coefficient between the variable and the response.
The error standard deviation can be estimated from the residual sum of squares. A simple matrix equation uses this estimate to find the standard errors of the least squares estimates.
95% confidence intervals can be found from the parameter estimates and their standard errors. The significance of the individual parameters can also be tested, but each such test assumes that all other variables are retained in the model.
A general linear model is linear in its parameters, but not necessarily in the explanatory variables. Models with transformed variables and with quadratic terms are all general linear models.
A model with a linear term and a quadratic term in x is still linear in the parameters and is a general linear model.
Polynomial models have terms involving various powers of x and are flexible ways to model curvature. As the order of the polynomial increases, the curve can become less smooth. Polynomials are usually poor for extrapolation.
For detecting curvature when there is more than one explanatory variable, it is better to plot residuals rather than the raw data.
If a plot of residuals against X shows curvature, a partial residual plot can give an indication of which nonlinear function of X to use in the model.
If the response in related linearly to Z but nonlinearly to X, a quadratic term in X can be added to the model to explain the curvature. The resulting model corresponds to a curved surface in a 3-dimensional scatterplot.
Quadratic terms in both X and Z can be added, resulting in a surface that is curved in both X and Z directions.
The residuals from a quadratic model can be represented as vertical lines from data points to the quadratic surface. If squares are drawn for each residual, least squares means minimising the total area of these squares.
Curvature can be assessed with t-test about whether the two quadratic parameters are non-zero.
In the models in previous sections, the effect of X on Y was the same for all values of Z and similarly the effect of Z on Y was the same, whatever the value of X.
Interaction between X and Z occurs when the effect on Y of increasing X is different for different values of Z. Adding a term in XZ to the model may explain the interaction.
A t-test for whether the coefficient of XZ is zero provides a simple test for interaction.
The existence and amount of interaction is affected by nonlinear transformations of the response. Sometimes analysing the log response can remove interaction, making the results easier to interpret.
In this page, a data set that has both curvature and interaction is analysed.
Problems with the multiple regression model may relate to all data points, but sometimes only one or two data points cause problems.
Data points have high leverage if their values for the explanatory variables are 'unusual'.
Because high leverage points pull the least squares plane strongly, their residuals are rarely large, even if they are outliers.
Standardising the residuals adjusts for the lower residual standard deviation of high leverage points.
Ordinary standardised residuals often fail to highlight outliers that are high leverage points. Standardising with a deleted estimate of the error variance is best for detecting outliers.
Leverage depends only on the explanatory variables and describes the potential of a point to influence the results. DFITS and Cook's D describe the actual influence of each point.
The externally studentised residuals, leverages and DFITS provide a good guide to problems with individual points. Several examples are given.