Normal linear model for the response

A regression model for the response in a bivariate data set describes how the response distribution depends on X. The most commonly used regression model is a normal linear model. This model involves:

Normality
At each value of X, Y has a normal distribution.
Constant variance
The standard deviation of Y is the same for all values of X.
Linearity
The mean of Y is linearly related to X.

The last two properties of the normal linear model can be expressed as

σy  =  σ

μy  =  β0  +  β1x

Note: only the response is modelled

A normal linear model does not try to explain the distribution of x-values.

Experimental data
In experimental data, the values of X are fixed by the experimenter, so their distribution is of no interest.
Observational data
In observational data, the values of X are usually random. Although it is possible to also model them with a univariate distribution, this is rarely done. The relationship between X and Y is analysed with a regression model that treats the values of X as constants. It is important to understand that the regression model only describes the conditional distribution of Y at each X.

Example of a normal linear model

A typical normal linear model is shown below.

Drag the slider to see how the distribution of Y depends on the value of X. Observe that...

Click Take sample a few times to observe typical data from this model when 5 response measurements are made at each of X = 1, 2, 3 and 4.

The model can also be used in situations where the values of X are not repeated. Select the option Regular X then take a few more samples to see typical data if the values of X are chosen to be 0.6, 0.8, 1.0, ..., 4.4.

Select the option Random X and take a few more samples to see typical data if the values of X are irregularly spaced.