CLASSIFY procedure

Obtains a starting classification for non-hierarchical clustering (S.A. Harding).

No options

Parameters

`DATA` = pointers	Each pointer contains a set of variates giving the properties of the units to be grouped
`NGROUPS` = scalars	Indicates the number of groups required
`GROUPS` = factors	Stores the classifications formed

Description

In non-hierarchical classification an initial classification is required, and it is advantageous to have these classes as homogeneous as possible. This reduces the risk of converging to a local optimum, and also encourages faster convergence of the iterative transfer algorithm used by the CLUSTER directive.

The attributes of the units to be formed into groups are specified in a set of variates; these should be placed into a pointer for use as the setting for the DATA parameter. The number of groups required is specified by the NGROUPS parameter; this must not be greater than the number of units. The group allocations that are formed are stored in the factor indicated by the GROUPS parameter. This factor need not be declared in advance but will be formed by the procedure.

Options: none.

Parameters: DATA, NGROUPS, GROUPS.

Method

When the number of groups is greater than the number of data variates plus one, CLASSIFY forms the groups according to the positions of the units in the first dimension of a principal coordinates analysis (PCO) of the DATA variates.

Otherwise it tries to find a suitable classification into the k groups by finding the k units that are furthest apart in p-dimensional space (where p is the number of variates). These are then used as nuclei for the classes, with each of the remaining units being allocated to the class with the nearest nucleus.

The units defining the nuclei are found by first finding the two units that are furthest apart. The third unit is the unit with greatest distance from the line joining the first two units. The fourth is the unit with greatest distance from the plane containing the first three units, and so on until the kth unit is the unit furthest from the (k-2) dimensional space spanned by the (k-1) units already found.

Action with `RESTRICT`

The variates must not be restricted.

Example

CAPTION  'CLASSIFY example',\
         !t('Ten units are described by four variables;',\
         'form an initial classification into four groups.');\
         STYLE=meta,plain
VARIATE  [NVALUES=10] Variable[1...4]
FACTOR   [LEVELS=4; NVALUES=10] InitCl
READ     Variable[1...4]
  1.0  4.0  2.0  1.0    4.0  6.0  1.0  3.0
  7.0  8.0  5.0  4.0    3.0  4.0  2.0  1.0
  6.0  4.0  5.0  2.0    9.0  4.0  1.0  2.0
  4.0  2.0  1.0  3.0    4.0  5.0  2.0  7.0
  1.0  2.0  4.0  1.0    5.0  1.0  2.0  0.0 :
CLASSIFY Variable; NGROUPS=4; GROUPS=InitCl
PRINT    Variable[1...4],InitCl

Updated on March 8, 2019

Was this article helpful?

Yes No