A linear model for proportions?

When we tried to model how a numerical explanatory variable affected a numerical response variable, we used a linear equation to model the relationship,

  =  b0 + b1 x

When the response variable is categorical, it is tempting to try a similar linear equation to explain how the proportion in one response category is affected by the explanatory variable,

predicted proportion,   

Unfortunately however, ...

... a linear equation is not appropriate for a proportion since it may result in predicted proportions greater than 1.0 or less than 0.0.

Nonlinear models

To model how a proportion depends on a numerical explanatory variable, X, an equation should give values between 0 and 1 for all possible values of X. This means that the equation must be nonlinear in X.

Fruit flies on mangoes

In an experiment to assess the effectiveness of heat-treatment of mangoes as a method of killing fruit fly eggs and larvae, several infested fruit were heat-treated at temperatures ranging from 39 to 46 degrees Celsius. The numbers of fruit fly eggs surviving at each temperature are shown in the table below.

Temp
Alive Dead
 Total 
39 degrees
41 degrees
43 degrees
44 degrees
45 degrees
46 degrees
117   222  
132   366  
64   526  
30   542  
1   588  
0   607  
339  
498  
590  
572  
589  
607  

The proportions surviving are shown in the following stacked barchart. A straight line has been drawn on the diagram to model how the proportion dying might depend on temperature.

Drag the vertical red line on the axis to obtain the predicted proportion dying at different temperatures.

The linear model is a reasonably close fit to the data between 39 and 45 degrees. From the slope of the line (approximately 0.056), we can tell that aproximately 5 percent of eggs are killed for each extra degree in temperature.

However the linear model predicts that more than 100% of eggs will be killed at temperatures greater than 46 degrees. Any linear model will predict proportions outside the range 0-1 for extreme enough values of X.

Now select the option Nonlinear model from the pop-up menu. This curve is better than the previous straight line since it remains between 0.0 and 1.0 for all ages.

Again drag the vertical red line on the axis to obtain the predicted proportion dying at different temperatures. A nonlinear model can provide reasonable predictions at all temperatures.