Obtains a starting classification for non-hierarchical clustering (S.A. Harding).
No options
Parameters
DATA = pointers |
Each pointer contains a set of variates giving the properties of the units to be grouped |
---|---|
NGROUPS = scalars |
Indicates the number of groups required |
GROUPS = factors |
Stores the classifications formed |
Description
In non-hierarchical classification an initial classification is required, and it is advantageous to have these classes as homogeneous as possible. This reduces the risk of converging to a local optimum, and also encourages faster convergence of the iterative transfer algorithm used by the CLUSTER
directive.
The attributes of the units to be formed into groups are specified in a set of variates; these should be placed into a pointer for use as the setting for the DATA
parameter. The number of groups required is specified by the NGROUPS
parameter; this must not be greater than the number of units. The group allocations that are formed are stored in the factor indicated by the GROUPS
parameter. This factor need not be declared in advance but will be formed by the procedure.
Options: none.
Parameters: DATA
, NGROUPS
, GROUPS
.
Method
When the number of groups is greater than the number of data variates plus one, CLASSIFY
forms the groups according to the positions of the units in the first dimension of a principal coordinates analysis (PCO
) of the DATA
variates.
Otherwise it tries to find a suitable classification into the k groups by finding the k units that are furthest apart in p-dimensional space (where p is the number of variates). These are then used as nuclei for the classes, with each of the remaining units being allocated to the class with the nearest nucleus.
The units defining the nuclei are found by first finding the two units that are furthest apart. The third unit is the unit with greatest distance from the line joining the first two units. The fourth is the unit with greatest distance from the plane containing the first three units, and so on until the kth unit is the unit furthest from the (k-2) dimensional space spanned by the (k-1) units already found.
Action with RESTRICT
The variates must not be restricted.
See also
Directive: CLUSTER
.
Commands for: Multivariate and cluster analysis.
Example
CAPTION 'CLASSIFY example',\ !t('Ten units are described by four variables;',\ 'form an initial classification into four groups.');\ STYLE=meta,plain VARIATE [NVALUES=10] Variable[1...4] FACTOR [LEVELS=4; NVALUES=10] InitCl READ Variable[1...4] 1.0 4.0 2.0 1.0 4.0 6.0 1.0 3.0 7.0 8.0 5.0 4.0 3.0 4.0 2.0 1.0 6.0 4.0 5.0 2.0 9.0 4.0 1.0 2.0 4.0 2.0 1.0 3.0 4.0 5.0 2.0 7.0 1.0 2.0 4.0 1.0 5.0 1.0 2.0 0.0 : CLASSIFY Variable; NGROUPS=4; GROUPS=InitCl PRINT Variable[1...4],InitCl