Response distribution at each X

In experiments, the values of the explanatory variable, X, are controlled by the experimenter. Several response measurements are often made at each distinct value of X. Experimental data are rare in business, but similar data arise when X is discrete — there are several response values corresponding to each distinct x-value.

At any single value of X, the repeated response measurements can be considered as a univariate data set and can be modelled as a random sample from some distribution — commonly a normal distribution. The characteristics of the distribution will often depend on the value of X.

The collection of distributions of Y at different values of X comprise a model for the complete bivariate data set called a regression model.

House prices and bathrooms

The sale prices of all houses sold in an area were collected. How does the sale price relate to the number of bathrooms in the houses?

The diagram below shows the resulting data. The crosses have been jittered a little (randomly moved) to separate them in the scatterplot.

This diagram is 3-dimensional. Position the mouse in the middle of the diagram and drag towards the top left of the screen to rotate the plot (or click the 3D rotation button). The histogram at each x-value describes the distribution of house prices with that number of bathrooms.

Possible model for house prices

The next diagram shows a possible model for the data above— a normal distribution for each number of bathrooms (X).

You may use the mouse (or the buttons at the top right) to rotate the 3-dimensional diagram. Click Take sample to show a random sample of values from each of these normal distributions. Our model claims that the observed data are a data set of this form.