Description of the model in terms of a response distribution

The normal linear model describes the distribution of Y for any value of X. It can be expressed in the form...

Y  ~  normaly , σy)

where

μy  =  β0  +  β1x

σy  =  σ

Description of the model in terms of 'errors'

An equivalent way to write the same model is...

y  =  β0  +  β1x  +  ε

where ε is called the model error and has a distribution

ε  ~  normal (0 , σ)

It is helpful to observe that the error, ε , for a data point is

ε  =  y  −  ( β0  +  β1x )

which is the vertical distance between the cross on a scatterplot and the regression line.

It is worth stressing here that in practical situations,

The slope and intercept of the regression line, β0 and β1, are unknown parameters. The errors, ε, therefore cannot be determined exactly.

In the next section, we will see how these quantities can be estimated.

Band containing about 95% of values

The 70-95-100 rule states that approximately 95% of values in any sample are within 2 standard deviations of the mean. In the context of a normal linear model, approximately 95% of the errors will therefore be within 2 standard deviations of zero — i.e. between ±2σ.

Since the errors are the vertical distances of points from the regression line, this means that...

Approximately 95% of the crosses will be within  ±2σ of the regression line (vertically).

There is therefore a band 2σ on each side of the regression line that contains approximately 95% of the crosses on a scatterplot of the data.

Example

The diagram below shows a normal linear model with parameters

μy  =  2.5  +  1.5x

σy  =  0.8

The blue regions in the tails of the normal probability density function are more than 2σ (i.e. 1.6 for this model) on each side of µy. They are approximately 5% of the normal distribution's area, so about 95% of y-values sampled from this distribution will be within the bounds. This is true for each X — drag the slider to verify — so approximately 95% of sampled values will lie in the gray band on the x-y plane.

Click Take sample a few times to verify that approximately 95% of values are within the grey band.

Finally, click the button at the top right of the diagram to look down on the x-y plane. In later pages, we will represent a normal linear model with a 2-dimensional diagram of this form.