Fits generalized linear models with multinomial distribution (R.W. Payne).
Options
PRINT = string tokens |
What to print (model , deviance , summary , estimates , correlations , fittedvalues , accumulated , monitoring , confidence ); default mode , summ , esti |
---|---|
RESPONSEFACTOR = factor |
Factor representing the response categories of the multinomial distribution |
CLASSIFICATION = factors |
Factors classifying the subjects; default uses the factors in TERMS |
FACTORIAL = scalar |
Limit for expansion of model terms from TERMS ; default 3 |
POOL = string token |
Whether to pool ss in accumulated summary between all terms fitted in a linear model (yes , no ); default no |
DENOMINATOR = string token |
Whether to base ratios in accumulated summary on rms from model with smallest residual ss or smallest residual ms (ss , ms ); default ss |
NOMESSAGE = string tokens |
Which warning messages to suppress (dispersion , leverage , residual , aliasing , marginality , vertical , df , inflation ); default * |
FPROBABILITY = string token |
Printing of probabilities for variance and deviance ratios (yes , no ); default no |
TPROBABILITY = string token |
Printing of probabilities for t-statistics (yes , no ); default no |
SELECTION = string tokens |
Statistics to be displayed in the summary of analysis produced by PRINT=summary (%variance , %ss , adjustedr2 , r2 , dispersion , %meandeviance , %deviance , aic , bic , sic ); default disp |
PROBABILITY = scalar |
Probability level for confidence intervals for parameter estimates; default 0.95 |
FULL = string token |
Whether to assign all possible parameters to factors and interactions (yes , no ); default no |
Parameter
TERMS = formula |
Terms to be fitted |
---|
Description
FITMULTINOMIAL
provides an automatic way of fitting generalized linear models with the multinomial distribution. These models can be fitted with the ordinary generalized linear models commands, by using the fact that a multinomial distribution can be generated by taking the sum of several Poisson variables (one for each outcome of the multinomial), and then constraining their sum to be equal to the multinomial total (see McCullagh & Nelder 1989, or any book on probability distributions).
The data for the model are counts of numbers of subjects observed in the various categories of the multinomial distribution. The counts may also be classified by various treatment factors, and the interest is in seeing how distribution of the subjects varies according to the treatments or any variates that differ over the levels of the treatments. These observations must be put into a single variate, and specified beforehand using Y
parameter of the MODEL
directive. The DISTRIBUTION
option of MODEL
should be set to Poisson
, and the LINK
option to logarithm
(or left as the default canonical
for the canonical link, which is logarithm
for the Poisson); this gives a logit link in the multinomial.
You also need to form a factor to identify the response category of the multinomial recorded in each unit of the Y
variate. This is then input to FITMULTINOMIAL
using the RESPONSEFACTOR
option. FITMULTINOMIAL
also has a CLASSIFICATION
option that can be used to specify the factors that classify the subjects. The other options have the same purpose as those in the FIT
directive. The model to be fitted is specified by the TERMS
parameter (like the first parameter of FIT
). If CLASSIFICATION
is unset, FITMULTINOMIAL
will use the set of factors that occur in TERMS
. Usually these will contain all the factors that classify the subjects. However, if you have a classification factor with numerical levels, you might for example want to fit a variate calculated as some function of the levels rather that a effect for every level of the factor. You could then specify the factors in the list for the CLASSIFICATION
option, and use the variate in TERMS
.
FITMULTINOMIAL
first fits a model defined as all factorial combinations of the CLASSIFICATION
factors. This imposes the constraint that the Poisson variables sum to the totals of the multinomial distribution. The effects of these terms assess how the design has been set up – i.e. how the subjects have been allocated to the treatments – but they have no information on the effects of the treatments on the response.
It then fits RESPONSEFACTOR
. This represents the overall distribution of the response categories across the subjects, and is analogous to the grand mean in an ordinary analysis. (This must be fitted, and so FITMULTINOMIAL
has no CONSTANT
option.)
Finally it fits the interactions of the terms in TERMS
with RESPONSEFACTOR
. These show how the distribution of subjects to response categories is affected by the treatment terms – which is the main interest of the analysis. The FACTORIAL
option sets a limit on the number of factors and/or variates in the model terms that are generated from the TERMS
formula. (Note, though, that the RESPONSEFACTOR
is ignored in interpreting this limit). By default these terms are fitted individually, so they will each have their own line in an accumulated analysis of deviance (option PRINT=accumulated
). However, you can set option POOL=yes
to fit them all at once.
After FITMULTINOMIAL
you can use the standard regression output commands, RDISPLAY
, RKEEP
and so on, in the usual way.
If you have a large model, you can set the GROUPS option in the earlier MODEL
statement to the response factor to save space. Note, though, that if you want to use the PREDICT
directive after FITMULTINOMIAL
, you will then only be able to predict values within one response category at a time.
Options: PRINT
, RESPONSEFACTOR
, CLASSIFICATION
, FACTORIAL
, POOL
, DENOMINATOR
, NOMESSAGE
, FPROBABILITY
, TPROBABILITY
, SELECTION
, PROBABILITY
, FULL
.
Parameter: TERMS
.
Method
FITMULTINOMIAL
uses the standard generalized linear models commands, as explained in the Description.
Action with RESTRICT
As in FIT
, the y-variate (specified in an earlier MODEL
directive) can be restricted to analyse a subset of the data.
Reference
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
See also
Directive: MODEL
.
Commands for: Regression analysis.
Example
CAPTION 'FITMULTINOMIAL example',\ !t('Frequencies of central nervous system malformations',\ 'in live births in 8 South Wales communities',\ '(McCullagh & Nelder 1989, Table 5.3).'); STYLE=meta,plain FACTOR [NVALUES=64; LABELS=!t(Cardiff,Newport,Swansea,'Glamorgan E.',\ 'Glamorgan W.','Glamorgan C.','Monmouth V.','Monmouth other')]\ community & [LABELS=!t('Non-manual',Manual)] worker & [LABELS=!t(None,'An.','Sp.',Other)] CNS GENERATE community,worker,CNS VARIATE births; VALUES=!(\ 4091, 5, 9,5, 9424,31,33,14,\ 1515, 1, 7,0, 4610, 3,15, 6,\ 2394, 9, 5,0, 5526,19,30, 4,\ 3163, 9,14,3, 13217,55,71,19,\ 1979, 5,10,1, 8195,30,44,10,\ 4838,11,12,2, 7803,25,28,12,\ 2362, 6, 8,4, 9962,36,37,13,\ 1604, 3, 6,0, 3172, 8,13, 3)\ TABLE [CLASS=community; VALUES=110,100,95,42,39,161,83,122] hardtable CALCULATE waterhardness = TPROJECT(hardtable) MODEL [DISTRIBUTION=poisson; LINK=log] births CAPTION !t('Notice the aliasing between terms involving waterhardness',\ 'and those involving community. This is because waterhardness',\ 'is defined by the location of the communities, and thus is a',\ 'contrast between community effects.') FITMULTINOMIAL [PRINT=#,accumulated; RESPONSEFACTOR=CNS; FPROBABILITY=yes]\ waterhardness + community + worker