Focus on the factor levels used in the experiment.

It is important to think of models in terms of their flexibility for modelling the response mean at the factor levels used in the experiment. Although a linear or quadratic model allows predictions at intermediate values of the factor, it is their fitted values at the levels used that affect the residuals.

Since it is the residuals that are the focus of least squares, what matters when assessing the fit of models is only the ability of the model to reduce the residual sum of squares.

Explanatory variables with two levels

If the explanatory factor is numerical but only two x-values are used in the experiment, the linear model allows complete flexibility in the values for the response mean at these two x-values. (The model has two degrees of freedom.)

Treating the factor as categorical with two levels has identical flexibility and is equivalent with respect to the fit of the model.

A numerical explanatory variable with 2 levels can be equivalently modelled as categorical.

Conversely,

A categorical variable with 2 levels could equivalently be coded as 0/1 and treated as numerical.

Explanatory variables with three levels

In a similar way, if an experiment involves a numerical factor that only takes three distinct values, a quadratic model has complete flexibility to give any mean responses at these three x-values. (The model has three degrees of freedom.) Modelling the factor as categorical is identical with respect to the flexibility and fit of the models.

Silicon wafer manufacture

One of the initial steps in fabricating integrated circuits is to grow an epitaxial layer on polished silicon wafers. The wafers are mounted on a six-faceted cylinder that is spun inside a metal bell jar. The jar is injected with chemical vapours through nozzles at the top of the jar and heated. The process continues until the epitaxial layer grows to a desired thickness. As part of an experiment to assess the effect of different factors on the thickness of the resulting layer, the nozzle position was varied between two levels (2 and 6) and other factors were kept constant (a low deposition time at 1210°C and continuous rotation of the cylinder). The resulting data are shown below:

Nozzle position Wafer thickness
2  13.768   13.778   13.870   13.896   13.932   13.914 
6 14.182 14.172 14.126 14.274 14.154 14.082

The diagram below shows the data.

Use the pop-up menu to change from a model that treats the nozzle position as a categorical factor with two levels (low and high) to a linear model. Observe that both models have the same flexibility.

Brightness of dyed fabric

An engineer in a textile mill studied the effect of temperature on the brightness of a synthetic fabric in a process involving dye. Several small randomly selected fabric specimens were dyed at 350, 375 and 400°F for 40 cycles. The brightness of each specimen was measured on a 50-point scale:

Temperature Brightness
350   38     32     30  
375 37 35 40
400 36 39 43

The data and a model that treats the factor as categorical are shown below.

Observe that modelling the factor as categorical and with a quadratic model both have complete flexibility in allowing any values for the brightness at temperatures 350, 375 and 400°F.