Focus on the factor levels used in the experiment.
It is important to think of models in terms of their flexibility for modelling the response mean at the factor levels used in the experiment. Although a linear or quadratic model allows predictions at intermediate values of the factor, it is their fitted values at the levels used that affect the residuals.
Since it is the residuals that are the focus of least squares, what matters when assessing the fit of models is only the ability of the model to reduce the residual sum of squares.
Explanatory variables with two levels
If the explanatory factor is numerical but only two x-values are used in the experiment, the linear model allows complete flexibility in the values for the response mean at these two x-values. (The model has two degrees of freedom.)
Treating the factor as categorical with two levels has identical flexibility and is equivalent with respect to the fit of the model.
A numerical explanatory variable with 2 levels can be equivalently modelled as categorical.
Conversely,
A categorical variable with 2 levels could equivalently be coded as 0/1 and treated as numerical.
Explanatory variables with three levels
In a similar way, if an experiment involves a numerical factor that only takes three distinct values, a quadratic model has complete flexibility to give any mean responses at these three x-values. (The model has three degrees of freedom.) Modelling the factor as categorical is identical with respect to the flexibility and fit of the models.
Silicon wafer manufacture
One of the initial steps in fabricating integrated circuits is to grow an epitaxial layer on polished silicon wafers. The wafers are mounted on a six-faceted cylinder that is spun inside a metal bell jar. The jar is injected with chemical vapours through nozzles at the top of the jar and heated. The process continues until the epitaxial layer grows to a desired thickness. As part of an experiment to assess the effect of different factors on the thickness of the resulting layer, the nozzle position was varied between two levels (2 and 6) and other factors were kept constant (a low deposition time at 1210°C and continuous rotation of the cylinder). The resulting data are shown below:
Nozzle position | Wafer thickness | |||||
---|---|---|---|---|---|---|
2 | 13.768 | 13.778 | 13.870 | 13.896 | 13.932 | 13.914 |
6 | 14.182 | 14.172 | 14.126 | 14.274 | 14.154 | 14.082 |
The diagram below shows the data.
Use the pop-up menu to change from a model that treats the nozzle position as a categorical factor with two levels (low and high) to a linear model. Observe that both models have the same flexibility.
Brightness of dyed fabric
An engineer in a textile mill studied the effect of temperature on the brightness of a synthetic fabric in a process involving dye. Several small randomly selected fabric specimens were dyed at 350, 375 and 400°F for 40 cycles. The brightness of each specimen was measured on a 50-point scale:
Temperature | Brightness | ||
---|---|---|---|
350 | 38 | 32 | 30 |
375 | 37 | 35 | 40 |
400 | 36 | 39 | 43 |
The data and a model that treats the factor as categorical are shown below.
Observe that modelling the factor as categorical and with a quadratic model both have complete flexibility in allowing any values for the brightness at temperatures 350, 375 and 400°F.