We assume that the data arise as random samples from two normal populations with the same standard deviation.
A linear model with an explanatory variable whose value is 0 or 1 depending on the group is equivalent to the model with separate parameters for each of the two group means.
The least squares estimates for the GLM with an indicator variable give fitted values that are the two group means.
The t-test for whether the coefficient of the indicator variable is zero is identical to the standard t-test for comparing two sample means.
The explained sum of squares can be interpreted as the sum of squares between groups; the residual sum of squares is the sum of squares within groups.
The anova F test for equal group means gives the same p-value and conclusion as the standard t test.
The normal model for several groups has the same standard deviation in each group but allows the means to be different.
Different means in g groups can be modelled with (g-1) indicator variables whose GLM coefficients are differences between the group means that of a baseline group.
The least squares estimates for the GLM with indicator variables for the groups result in fitted values that are the group sample means.
T-tests for the coefficients of separate indicator variables test whether the mean of that group equals the mean of the baseline group. This may make sense if the baseline group is a control treatment, but in general there are too many pairwise comparisons to rely on the results of separate t-tests.
Testing for equal group means requires simultaneous testing of the coefficients of the (g-1) indicator variables using analysis of variance. The explained and residual sums of squares describe between-group and within-group variation.
The coefficient of determination is the proportion of response variation that is explained by the groups.
An anova table with between-group and within-group sums of squares provides a test for equal group means.
Many parameterisations are possible for the model with arbitrary group means. An example is given that allows testing of whether a subset of group means are equal.
A simple linear model for Y against X can be augmented with a 0/1 indicator variable distinguishing the groups. The model is a GLM and can be represented by two parallel lines on the scatterplot of Y vs X.
A t-test for whether the coefficient of the indicator variable is zero tests whether the groups are the same.
If there are g groups, (g-1) indicator variables can be added to the simple linear model. This corresponds to g parallel lines for the groups on the scatterplot of Y vs X.
Analysis of variance provides a single test for differences between the groups. If X is not orthogonal to the groups, there are two different anova tables corresponding to the two orders of adding X and the indicator variables.
Observations that are split into groups can be equivalently considered as a single data set with a categorical variable defining group membership.
Data with two categorical explanatory variables often arise from designed experiments. Two sets of indicator variables can be used to model the effects of the two variables in a GLM.
The effects of the categorical explanatory variables can be tested with analysis of variance. The categorical explanatory variables are usually orthogonal in designed experiments and a single anova table can test both variables.
The models in this section can be extended with terms for any mixture of numerical and categorical explanatory variables. If one or more categorical explanatory variables has 3+ levels, F-tests based on Type 3 sums of squares should be used to test significance instead of t-tests for individual parameters.
Interaction between two numerical variables can be modelled in a GLM with a term involving the product of the variables.
If there is no interaction between a numerical and categorical explanatory variable, the regression lines for all categories are parallel. If these regression lines are not parallel, then there is interaction. Extra terms can be added to the no-interaction GLM to model the interaction.
The existence of interaction can be tested with a test for whether the parameters for the interaction terms are zero. This can be done with a t-test if there are only 2 categories, but an F-test is needed for more categories.
Indicator variables can be added to the no-interaction GLM to model an interaction between 2 categorical variables.
Testing for interaction is equivalent to testing whether the parameters for the interaction indicator variables are all zero.
An example is shown with several main effects and interactions.
When there are several response values at each x, the most general model for curvature allows for an arbitrary response mean at each x. This model places no constraint on the shape of the curvature.
The nonlinearity sum of squares describes the distances of the group means from a straight line.
An F ratio comparing the nonlinear and residual sums of squares provides a test for linearity.
The g-group model that allows for arbitrary response means at each x can be parameterised in a way that makes the test for linearity equivalent to testing whether the coefficients of (g-2) indicator variables are zero.
The quadratic test is more likely to detect 'smooth' nonlinearity. The test based on the factor model is better at detecting more irregular types of nonlinearity, including those that can arise from badly randomised experiments.
In experiments where there are repeated response measurements at different x-z combinations it is possible to perform a more general anova test about the fit of a model. This can detect both curvature and interaction.