A linear model for proportions?
When we modelled how a numerical explanatory variable affected a numerical response variable, a linear equation was used,
= b0 + b1 x
When the response variable is categorical, it is tempting to try a similar linear equation to explain how the proportion in one response category is affected by the explanatory variable,
predicted proportion, | ![]() |
---|
Unfortunately however, ...
... a linear equation is not appropriate for a proportion since it may result in predicted proportions greater than 1.0 or less than 0.0.
Nonlinear models
To model how a proportion depends on a numerical explanatory variable, X, an equation should give values between 0 and 1 for all possible values of X. This means that the equation must be nonlinear in X.
Menstruation and age
A straight line has been drawn on the menstruation data below to model how the proportion menstruating might depend on age.
Drag the vertical red line on the axis to obtain the predicted proportion menstruating at different ages.
The linear model is a reasonably close fit to the data between ages 111/2 and 141/2. From the slope of the line (approximately 0.25), we can tell that aproximately a quarter of girls start menstruating each year.
However the linear model predicts that more than 100% of girls will have started menstruating at ages greater than 15, and a negative proportion menstruating at ages less than 11. Any linear model will predict proportions outside the range 0-to-1 for extreme enough values of X.
Now select the option Nonlinear model from the pop-up menu. This curve is better than the previous straight line since it remains between 0 and 1 for all ages.
Again drag the vertical red line on the axis to obtain the predicted proportion menstruating at different ages. Observe that this nonlinear model can provide reasonable predictions at all ages.