QDISCRIMINATE procedure

Performs quadratic discrimination between groups i.e. allowing for different variance-covariance matrices (D.B. Baird).

Options

`PRINT` = string tokens	Printed output from the analysis (`allocation`, `counts`, `distance`, `probabilities`, `specificity`, `summary`, `table`, `validation`, `vcovariance`); default `spec`, `summ`, `vali`
`VALIDATIONMETHOD` = string token	Validation method to use to calculate error rates (`bootstrap`, `crossvalidation`, `jackknife`, `prediction`); default `cros`
`NSIMULATIONS` = scalar	Number of bootstraps or cross-validation sets; default 50
`NCROSSVALIDATIONGROUPS` = scalar	Number of groups for cross-validation, default 10

Parameters

`DATA` = pointers	Each pointer contains a training set of variates to be used to form a quadratic discrimination
`GROUPS` = factors	Define groupings for the units in each training set
`PRIORPROBABILITIES` = variates	Prior probabilities of group membership; default `*` i.e. equal
`SEED` = scalars	Seed for the random numbers used in bootstrapping or cross-validation; default 0 continues from the previous generation or (if none) initializes the seed automatically
`ERRORRATE` = scalars	Saves the validation error rate
`SPECIFICITY` = matrices	Saves the specificity table
`ALLOCATION` = factors	Saves the groups allocated by the discriminant rule
`PROBABILITIES` = matrices or pointers	Save posterior probabilities of membership of the groups (in the columns of a matrix or the variates in a pointer) for the units in the training set (in the rows)

Description

QDISCRIMINATE performs a quadratic discrimination analysis to identify members of a set of groups using their observations on a set of variates. The quadratic discrimination rule assumes that the values of the variates within each group are distributed with a multi-variate Normal distribution, and that the variance-covariance matrix of the distributions are different for each group. This differs from the more familiar linear discriminant analysis, performed by procedure DISCRIMINATE, where the groups are assumed to have the same variance-covariance matrix.

The variates to be used to discriminate between the groups are specified in a pointer by the DATA parameter, and the membership of the groups is specified in a factor by the GROUPS parameter. The non-missing units of the GROUPS factor provide a training set to estimate the discriminant rule. Units that you would like to allocate to groups using the discriminant rule should be included in the data set with missing values in the GROUPS factor.

You can specify prior probabilities for the groups using the PRIORPROBABILITIES option; by default the groups are all assumed to be equally likely. You can use this to allow for unequal costs of mis-allocation by weighting the prior probabilities like this:

PRIORPROBABILITIES = Cost * Prior / SUM(Cost * Prior)

where Cost is a variate defining the cost of mis-allocation for each group.

Printed output is controlled by the option PRINT, with settings:

`allocation`	the allocated group for each unit,
`counts`	number of units in each group with a complete set of observations,
`distance`	generalized pairwise distance between group means,
`probabilities`	the posterior probability of being allocated to each group,
`specificity`	specificity of allocation (i.e. the proportion of each group that is assigned correctly),
`summary`	summary of the model fitting,
`table`	table of counts of training units allocated to each group,
`validation`	the error rate, and
`vcovariance`	variance-covariance matrices for the groups

The default is PRINT=spec,summ,vali.

The VALIDATIONMETHOD option specifies the validation method, with settings for prediction, cross-validation, jackknife and bootstrap. Prediction calculates

the error rate as the proportion of the training set that were misallocated. Cross-validation works by randomly splitting the units into a number of groups specified by the NCROSSVALIDATIONGROUPS option (default 10). It then omits each of the groups, in turn, and predicts how the the omitted units are allocated to the discrimination groups. Jackknifing leaves the units out one at a time, and uses the rest of the data to predict the group of the omitted unit. The bootstrap method works by drawing a bootstrap sample of units (a random sample of units with replacement of the same size as the original sample), and predicting the units that are not present in the random sample. The resulting bootstrap error rate is then calculated as a weighted average of the error rate of the omitted observations and the predictive error rate of the bootstrap sample. The weights used are 0.632 and 0.368 respectively, and so this is known as the 632 rule.

The NSIMULATIONS option sets the number of simulations for cross-validation or bootstrapping; default 50.

The SEED parameter provides the seed for the random numbers used for the randomizations during in the simulations. The default value of 0 continues an existing sequence of random numbers, if none have been used in the current Genstat job, it initializes the seed automatically using the computer clock.

The ERRORRATE parameter can save the validation error rates. The SPECIFICITY parameter can save the proportion of each group that is assigned correctly. The ALLOCATION parameter can save the assigned groups, and the PROBABILITIES parameter can save the posterior probabilities of the groups.

Options: PRINT, VALIDATIONMETHOD, NSIMULATIONS, NCROSSVALIDATIONGROUPS.

Parameters: DATA, GROUPS, PRIORPROBABILITIES, SEED, ERRORRATE, SPECIFICITY, ALLOCATION, PROBABILITIES.

Method

The FSSPM directive is used to calculate the variance-covariance matrices of the groups. The posterior probability of belonging to each group are then calculated for each unit, and its membership is assigned to the most likely group. For more details, see e.g. Hastie et al. (2001) or McLachlan (1992).

Action with `RESTRICT`

The input variates and factor may be restricted (but any restrictions must be identical). The restricted units are omitted from the analysis.

References

Hastie, T., Tibshirani, R. & Friedman J. (2001). The Elements of Statistical Learning. Springer, New York.

McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, Hoboken, New Jersey.

Example

CAPTION       'QDISCRIMINATE example','Fisher''s Iris data'; STYLE=meta,plain
SPLOAD        [PRINT=*] '%GENDIR%/Data/Iris.gsh'
POINTER       [VALUES=Sepal_Length,Sepal_Width,Petal_Length,Petal_Width]\
              Measures
QDISCRIMINATE [PRINT=allocation,probabilities,specificity,summary,validation;\
              VALIDATIONMETHOD=bootstrap; NSIMULATIONS=100]\
              Measures; GROUPS=Species; SEED=764527

Updated on June 19, 2019

Was this article helpful?

Yes No