Interpretation of regression lines

The problem with fitting separate regression lines by least squares in the different groups is that it is difficult to concisely explain the difference between the groups — the difference between the predicted response in the groups depends on the value of the explanatory variable.

In the two-group example below, the difference between Y in Groups 1 and 2 is much greater when X is large than when X is small.

Parallel regression lines

Interpretation is considerably simplified if we constrain the regression lines for the different groups to have the same slope. Select Parallel lines from the pop-up menu above. Both regression lines now have the same slope c, so Group 1 is predicted to be (a - b) higher than Group 2 irrespective of the value of X.

It is not appropriate to fit parallel lines to different groups for all data sets — always check a scatterplot first — but, if it is, the resulting least squares lines are much easier to interpret.

Least squares

The principle behind fitting parallel lines to two or more groups is the same as in ordinary simple regression — we choose the parameters to minimise the sum of squared residuals. The formulae for the resulting lines are much harder to express algebraically and will therefore not be described here. However most statistical software will do the calculations for you, so the precise details are of little importance.

Gas consumption and insulation

Data were collected in the 1960s at a house in south-east England about weekly gas consumption (in 1000 cubic feet) and the average outside temperature (in degrees Celsius) for 26 weeks before and 18 weeks after cavity-wall insulation had been installed. The house thermostat was set at 20°C throughout.

The diagram initially shows a single regression line that is fitted by least squares to all the data.

There is clearly a difference between gas consumption before and after insulation was installed, so select Separate lines from the pop-up menu. These lines fit much closer to the data but it is much harder to summarise the effect of the insulation since it differs depending on the temperature.

The separate lines seem to be close to parallel, so finally select Equal slopes from the pop-up menu to fit two parallel lines to the data (by least squares). From these parallel lines, we can summarise the effect of the insulation as follows.

Gas consumption is (6.717 - 4.922) = 1.795 lower after insulation, regardless of the outside temperature.

Are parallel lines reasonable?

Although the parallel lines seem to fit these data reasonably well and allow the effect of insulation to be summarised concisely, it is important to consider whether they match our knowledge of the physical system underlying the data. Surely we should expect that insulation would have more effect on gas consumption when the temperature is very low than when it is high? Perhaps parallel lines are an over-simplification and separate least squares lines should be used to describe the effect of insulation, despite the extra difficulty of interpretation.