Performs quadratic discrimination between groups i.e. allowing for different variance-covariance matrices (D.B. Baird).
|Printed output from the analysis (
||Validation method to use to calculate error rates (
||Number of bootstraps or cross-validation sets; default 50|
||Number of groups for cross-validation, default 10|
||Each pointer contains a training set of variates to be used to form a quadratic discrimination|
||Define groupings for the units in each training set|
||Prior probabilities of group membership; default
||Seed for the random numbers used in bootstrapping or cross-validation; default 0 continues from the previous generation or (if none) initializes the seed automatically|
||Saves the validation error rate|
||Saves the specificity table|
||Saves the groups allocated by the discriminant rule|
||Save posterior probabilities of membership of the groups (in the columns of a matrix or the variates in a pointer) for the units in the training set (in the rows)|
QDISCRIMINATE performs a quadratic discrimination analysis to identify members of a set of groups using their observations on a set of variates. The quadratic discrimination rule assumes that the values of the variates within each group are distributed with a multi-variate Normal distribution, and that the variance-covariance matrix of the distributions are different for each group. This differs from the more familiar linear discriminant analysis, performed by procedure
DISCRIMINATE, where the groups are assumed to have the same variance-covariance matrix.
The variates to be used to discriminate between the groups are specified in a pointer by the
DATA parameter, and the membership of the groups is specified in a factor by the
GROUPS parameter. The non-missing units of the
GROUPS factor provide a training set to estimate the discriminant rule. Units that you would like to allocate to groups using the discriminant rule should be included in the data set with missing values in the
You can specify prior probabilities for the groups using the
PRIORPROBABILITIES option; by default the groups are all assumed to be equally likely. You can use this to allow for unequal costs of mis-allocation by weighting the prior probabilities like this:
PRIORPROBABILITIES = Cost * Prior / SUM(Cost * Prior)
Cost is a variate defining the cost of mis-allocation for each group.
Printed output is controlled by the option
||the allocated group for each unit,|
||number of units in each group with a complete set of observations,|
||generalized pairwise distance between group means,|
||the posterior probability of being allocated to each group,|
||specificity of allocation (i.e. the proportion of each group that is assigned correctly),|
||summary of the model fitting,|
||table of counts of training units allocated to each group,|
||the error rate, and|
||variance-covariance matrices for the groups|
The default is
VALIDATIONMETHOD option specifies the validation method, with settings for prediction, cross-validation, jackknife and bootstrap. Prediction calculates
the error rate as the proportion of the training set that were misallocated. Cross-validation works by randomly splitting the units into a number of groups specified by the
NCROSSVALIDATIONGROUPS option (default 10). It then omits each of the groups, in turn, and predicts how the the omitted units are allocated to the discrimination groups. Jackknifing leaves the units out one at a time, and uses the rest of the data to predict the group of the omitted unit. The bootstrap method works by drawing a bootstrap sample of units (a random sample of units with replacement of the same size as the original sample), and predicting the units that are not present in the random sample. The resulting bootstrap error rate is then calculated as a weighted average of the error rate of the omitted observations and the predictive error rate of the bootstrap sample. The weights used are 0.632 and 0.368 respectively, and so this is known as the 632 rule.
NSIMULATIONS option sets the number of simulations for cross-validation or bootstrapping; default 50.
SEED parameter provides the seed for the random numbers used for the randomizations during in the simulations. The default value of 0 continues an existing sequence of random numbers, if none have been used in the current Genstat job, it initializes the seed automatically using the computer clock.
ERRORRATE parameter can save the validation error rates. The
SPECIFICITY parameter can save the proportion of each group that is assigned correctly. The
ALLOCATION parameter can save the assigned groups, and the
PROBABILITIES parameter can save the posterior probabilities of the groups.
FSSPM directive is used to calculate the variance-covariance matrices of the groups. The posterior probability of belonging to each group are then calculated for each unit, and its membership is assigned to the most likely group. For more details, see e.g. Hastie et al. (2001) or McLachlan (1992).
The input variates and factor may be restricted (but any restrictions must be identical). The restricted units are omitted from the analysis.
Hastie, T., Tibshirani, R. & Friedman J. (2001). The Elements of Statistical Learning. Springer, New York.
McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, Hoboken, New Jersey.
Commands for: Multivariate and cluster analysis.
CAPTION 'QDISCRIMINATE example','Fisher''s Iris data'; STYLE=meta,plain SPLOAD [PRINT=*] '%GENDIR%/Data/Iris.gsh' POINTER [VALUES=Sepal_Length,Sepal_Width,Petal_Length,Petal_Width]\ Measures QDISCRIMINATE [PRINT=allocation,probabilities,specificity,summary,validation;\ VALIDATIONMETHOD=bootstrap; NSIMULATIONS=100]\ Measures; GROUPS=Species; SEED=764527