FITMULTINOMIAL procedure

Fits generalized linear models with multinomial distribution (R.W. Payne).

Options

`PRINT` = string tokens	What to print (`model`, `deviance`, `summary`, `estimates`, `correlations`, `fittedvalues`, `accumulated`, `monitoring`, `confidence`); default `mode`, `summ`, `esti`
`RESPONSEFACTOR` = factor	Factor representing the response categories of the multinomial distribution
`CLASSIFICATION` = factors	Factors classifying the subjects; default uses the factors in `TERMS`
`FACTORIAL` = scalar	Limit for expansion of model terms from `TERMS`; default 3
`POOL` = string token	Whether to pool ss in accumulated summary between all terms fitted in a linear model (`yes`, `no`); default `no`
`DENOMINATOR` = string token	Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (`ss`, `ms`); default `ss`
`NOMESSAGE` = string tokens	Which warning messages to suppress (`dispersion`, `leverage`, `residual`, `aliasing`, `marginality`, `vertical`, `df`, `inflation`); default `*`
`FPROBABILITY` = string token	Printing of probabilities for variance and deviance ratios (`yes`, `no`); default `no`
`TPROBABILITY` = string token	Printing of probabilities for t-statistics (`yes`, `no`); default `no`
`SELECTION` = string tokens	Statistics to be displayed in the summary of analysis produced by `PRINT=summary` (`%variance`, `%ss`, `adjustedr2`, `r2`, `dispersion`, `%meandeviance`, `%deviance`, `aic`, `bic`, `sic`); default `disp`
`PROBABILITY` = scalar	Probability level for confidence intervals for parameter estimates; default 0.95
`FULL` = string token	Whether to assign all possible parameters to factors and interactions (`yes`, `no`); default `no`

Parameter

`TERMS` = formula	Terms to be fitted

Description

FITMULTINOMIAL provides an automatic way of fitting generalized linear models with the multinomial distribution. These models can be fitted with the ordinary generalized linear models commands, by using the fact that a multinomial distribution can be generated by taking the sum of several Poisson variables (one for each outcome of the multinomial), and then constraining their sum to be equal to the multinomial total (see McCullagh & Nelder 1989, or any book on probability distributions).

The data for the model are counts of numbers of subjects observed in the various categories of the multinomial distribution. The counts may also be classified by various treatment factors, and the interest is in seeing how distribution of the subjects varies according to the treatments or any variates that differ over the levels of the treatments. These observations must be put into a single variate, and specified beforehand using Y parameter of the MODEL directive. The DISTRIBUTION option of MODEL should be set to Poisson, and the LINK option to logarithm (or left as the default canonical for the canonical link, which is logarithm for the Poisson); this gives a logit link in the multinomial.

You also need to form a factor to identify the response category of the multinomial recorded in each unit of the Y variate. This is then input to FITMULTINOMIAL using the RESPONSEFACTOR option. FITMULTINOMIAL also has a CLASSIFICATION option that can be used to specify the factors that classify the subjects. The other options have the same purpose as those in the FIT directive. The model to be fitted is specified by the TERMS parameter (like the first parameter of FIT). If CLASSIFICATION is unset, FITMULTINOMIAL will use the set of factors that occur in TERMS. Usually these will contain all the factors that classify the subjects. However, if you have a classification factor with numerical levels, you might for example want to fit a variate calculated as some function of the levels rather that a effect for every level of the factor. You could then specify the factors in the list for the CLASSIFICATION option, and use the variate in TERMS.

FITMULTINOMIAL first fits a model defined as all factorial combinations of the CLASSIFICATION factors. This imposes the constraint that the Poisson variables sum to the totals of the multinomial distribution. The effects of these terms assess how the design has been set up – i.e. how the subjects have been allocated to the treatments – but they have no information on the effects of the treatments on the response.

It then fits RESPONSEFACTOR. This represents the overall distribution of the response categories across the subjects, and is analogous to the grand mean in an ordinary analysis. (This must be fitted, and so FITMULTINOMIAL has no CONSTANT option.)

Finally it fits the interactions of the terms in TERMS with RESPONSEFACTOR. These show how the distribution of subjects to response categories is affected by the treatment terms – which is the main interest of the analysis. The FACTORIAL option sets a limit on the number of factors and/or variates in the model terms that are generated from the TERMS formula. (Note, though, that the RESPONSEFACTOR is ignored in interpreting this limit). By default these terms are fitted individually, so they will each have their own line in an accumulated analysis of deviance (option PRINT=accumulated). However, you can set option POOL=yes to fit them all at once.

After FITMULTINOMIAL you can use the standard regression output commands, RDISPLAY, RKEEP and so on, in the usual way.

If you have a large model, you can set the GROUPS option in the earlier MODEL statement to the response factor to save space. Note, though, that if you want to use the PREDICT directive after FITMULTINOMIAL, you will then only be able to predict values within one response category at a time.

Options: PRINT, RESPONSEFACTOR, CLASSIFICATION, FACTORIAL, POOL, DENOMINATOR, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELECTION, PROBABILITY, FULL.

Parameter: TERMS.

Method

FITMULTINOMIAL uses the standard generalized linear models commands, as explained in the Description.

Action with `RESTRICT`

As in FIT, the y-variate (specified in an earlier MODEL directive) can be restricted to analyse a subset of the data.

Reference

McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.

Example

CAPTION   'FITMULTINOMIAL example',\
          !t('Frequencies of central nervous system malformations',\
          'in live births in 8 South Wales communities',\
          '(McCullagh & Nelder 1989, Table 5.3).'); STYLE=meta,plain
FACTOR    [NVALUES=64; LABELS=!t(Cardiff,Newport,Swansea,'Glamorgan E.',\
          'Glamorgan W.','Glamorgan C.','Monmouth V.','Monmouth other')]\
          community
&         [LABELS=!t('Non-manual',Manual)] worker
&         [LABELS=!t(None,'An.','Sp.',Other)] CNS
GENERATE  community,worker,CNS
VARIATE   births; VALUES=!(\
          4091, 5, 9,5,  9424,31,33,14,\
          1515, 1, 7,0,  4610, 3,15, 6,\
          2394, 9, 5,0,  5526,19,30, 4,\
          3163, 9,14,3, 13217,55,71,19,\
          1979, 5,10,1,  8195,30,44,10,\
          4838,11,12,2,  7803,25,28,12,\
          2362, 6, 8,4,  9962,36,37,13,\
          1604, 3, 6,0,  3172, 8,13, 3)\
TABLE     [CLASS=community; VALUES=110,100,95,42,39,161,83,122] hardtable
CALCULATE waterhardness = TPROJECT(hardtable)
MODEL     [DISTRIBUTION=poisson; LINK=log] births
CAPTION   !t('Notice the aliasing between terms involving waterhardness',\
          'and those involving community. This is because waterhardness',\
          'is defined by the location of the communities, and thus is a',\
          'contrast between community effects.')
FITMULTINOMIAL [PRINT=#,accumulated; RESPONSEFACTOR=CNS; FPROBABILITY=yes]\
          waterhardness + community + worker

Updated on March 8, 2019

Was this article helpful?

Yes No