Least squares
In practical situations, the three parameters of the normal linear model, β0, β1 and σ, are unknown values — all that we have available is a single data set that we believe comes from a model of this form. Although we cannot hope to determine the values of these unknown parameters exactly, we can obtain estimates of them from the data.
We previously examined bivariate data of this form and fitted a line by least squares. The slope and intercept of the least squares line are estimates of the slope and intercept of the regression line.
The best estimates of β0 and β1 are the slope and intercept of the least squares line, b0 and b1
Since b0 and b1 are functions of a data set that we assume to be a random sample from the normal linear model, b0 and b1 are themselves random quantities — they would be different if a different data set was collected.
Zinc in aquatic plants and lake sediment
In practice, only a single data set is available. The scatterplot below shows zinc concentrations in the aquatic plant Eriocaulon septangulare (micrograms per gram dry weight) and zinc concentrations in sediment (micrograms per gram) from several lakes in Ontario.
Our 'best guesses' for β0 and β1 are the least squares estimates shown in the blue equation.
Variability of the least squares slope and intercept
The diagram below represents a normal linear model. (The band is 2σy above and below the regression line that shows how µy depends on X.)
Click Take sample a few times to generate different data from the model. Observe the variability of the least squares lines fitted to these data sets.
The two parameter estimates (the values in the blue equation) are usually close to the model values (in the top equation), but they vary from sample to sample.
The sample-to-sample variability of the least squares estimates means that the least squares slope and intercept in the aquatic zinc data are unlikely to be exactly equal to the underlying β0 and β1.