1. Home
2. FITMULTINOMIAL procedure

# FITMULTINOMIAL procedure

Fits generalized linear models with multinomial distribution (R.W. Payne).

### Options

`PRINT` = string tokens What to print (`model`, `deviance`, `summary`, `estimates`, `correlations`, `fittedvalues`, `accumulated`, `monitoring`, `confidence`); default `mode`, `summ`, `esti` Factor representing the response categories of the multinomial distribution Factors classifying the subjects; default uses the factors in `TERMS` Limit for expansion of model terms from `TERMS`; default 3 Whether to pool ss in accumulated summary between all terms fitted in a linear model (`yes`, `no`); default `no` Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (`ss`, `ms`); default `ss` Which warning messages to suppress (`dispersion`, `leverage`, `residual`, `aliasing`, `marginality`, `vertical`, `df`, `inflation`); default `*` Printing of probabilities for variance and deviance ratios (`yes`, `no`); default `no` Printing of probabilities for t-statistics (`yes`, `no`); default `no` Statistics to be displayed in the summary of analysis produced by `PRINT=summary` (`%variance`, `%ss`, `adjustedr2`, `r2`, `dispersion`, `%meandeviance`, `%deviance`, `aic`, `bic`, `sic`); default `disp` Probability level for confidence intervals for parameter estimates; default 0.95 Whether to assign all possible parameters to factors and interactions (`yes`, `no`); default `no`

### Parameter

`TERMS` = formula Terms to be fitted

### Description

`FITMULTINOMIAL` provides an automatic way of fitting generalized linear models with the multinomial distribution. These models can be fitted with the ordinary generalized linear models commands, by using the fact that a multinomial distribution can be generated by taking the sum of several Poisson variables (one for each outcome of the multinomial), and then constraining their sum to be equal to the multinomial total (see McCullagh & Nelder 1989, or any book on probability distributions).

The data for the model are counts of numbers of subjects observed in the various categories of the multinomial distribution. The counts may also be classified by various treatment factors, and the interest is in seeing how distribution of the subjects varies according to the treatments or any variates that differ over the levels of the treatments. These observations must be put into a single variate, and specified beforehand using `Y` parameter of the `MODEL` directive. The `DISTRIBUTION` option of `MODEL` should be set to `Poisson`, and the `LINK` option to `logarithm` (or left as the default `canonical` for the canonical link, which is `logarithm` for the Poisson); this gives a logit link in the multinomial.

You also need to form a factor to identify the response category of the multinomial recorded in each unit of the `Y` variate. This is then input to `FITMULTINOMIAL` using the `RESPONSEFACTOR` option. `FITMULTINOMIAL` also has a `CLASSIFICATION` option that can be used to specify the factors that classify the subjects. The other options have the same purpose as those in the `FIT` directive. The model to be fitted is specified by the `TERMS` parameter (like the first parameter of `FIT`). If `CLASSIFICATION` is unset, `FITMULTINOMIAL` will use the set of factors that occur in `TERMS`. Usually these will contain all the factors that classify the subjects. However, if you have a classification factor with numerical levels, you might for example want to fit a variate calculated as some function of the levels rather that a effect for every level of the factor. You could then specify the factors in the list for the `CLASSIFICATION` option, and use the variate in `TERMS`.

`FITMULTINOMIAL` first fits a model defined as all factorial combinations of the `CLASSIFICATION` factors. This imposes the constraint that the Poisson variables sum to the totals of the multinomial distribution. The effects of these terms assess how the design has been set up – i.e. how the subjects have been allocated to the treatments – but they have no information on the effects of the treatments on the response.

It then fits `RESPONSEFACTOR`. This represents the overall distribution of the response categories across the subjects, and is analogous to the grand mean in an ordinary analysis. (This must be fitted, and so `FITMULTINOMIAL` has no `CONSTANT` option.)

Finally it fits the interactions of the terms in `TERMS` with `RESPONSEFACTOR`. These show how the distribution of subjects to response categories is affected by the treatment terms – which is the main interest of the analysis. The `FACTORIAL` option sets a limit on the number of factors and/or variates in the model terms that are generated from the `TERMS` formula. (Note, though, that the `RESPONSEFACTOR` is ignored in interpreting this limit). By default these terms are fitted individually, so they will each have their own line in an accumulated analysis of deviance (option `PRINT=accumulated`). However, you can set option `POOL=yes` to fit them all at once.

After `FITMULTINOMIAL` you can use the standard regression output commands, `RDISPLAY`, `RKEEP` and so on, in the usual way.

If you have a large model, you can set the GROUPS option in the earlier `MODEL` statement to the response factor to save space. Note, though, that if you want to use the `PREDICT` directive after `FITMULTINOMIAL`, you will then only be able to predict values within one response category at a time.

Options: `PRINT`, `RESPONSEFACTOR`, `CLASSIFICATION`, `FACTORIAL`, `POOL`, `DENOMINATOR`, `NOMESSAGE`, `FPROBABILITY`, `TPROBABILITY`, `SELECTION`, `PROBABILITY`, `FULL`.

Parameter: `TERMS`.

### Method

`FITMULTINOMIAL` uses the standard generalized linear models commands, as explained in the Description.

### Action with `RESTRICT`

As in `FIT`, the y-variate (specified in an earlier `MODEL` directive) can be restricted to analyse a subset of the data.

### Reference

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.

Directive: `MODEL`.

Commands for: Regression analysis.

### Example

```CAPTION   'FITMULTINOMIAL example',\
!t('Frequencies of central nervous system malformations',\
'in live births in 8 South Wales communities',\
'(McCullagh & Nelder 1989, Table 5.3).'); STYLE=meta,plain
FACTOR    [NVALUES=64; LABELS=!t(Cardiff,Newport,Swansea,'Glamorgan E.',\
'Glamorgan W.','Glamorgan C.','Monmouth V.','Monmouth other')]\
community
&         [LABELS=!t('Non-manual',Manual)] worker
&         [LABELS=!t(None,'An.','Sp.',Other)] CNS
GENERATE  community,worker,CNS
VARIATE   births; VALUES=!(\
4091, 5, 9,5,  9424,31,33,14,\
1515, 1, 7,0,  4610, 3,15, 6,\
2394, 9, 5,0,  5526,19,30, 4,\
3163, 9,14,3, 13217,55,71,19,\
1979, 5,10,1,  8195,30,44,10,\
4838,11,12,2,  7803,25,28,12,\
2362, 6, 8,4,  9962,36,37,13,\
1604, 3, 6,0,  3172, 8,13, 3)\
TABLE     [CLASS=community; VALUES=110,100,95,42,39,161,83,122] hardtable
CALCULATE waterhardness = TPROJECT(hardtable)