Declares a self-organizing map (R.W. Payne).
||Identifiers of the SOMs|
||Names of variables corresponding to the weights of each SOM|
||Number of rows or row coordinates for the map|
||Number of columns or column coordinates for the map|
||Method for calculating the distances of data points from the modes (
||Method for calculating the contribution of a data point to each node when revising the weights (
A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable).
SOM defines the Genstat data structures used to represent self-organizing maps. These are compound data structures similar, for example, to the LRV structure used to store latent roots and vectors (see the
LRV directive). Compound data structures are like Genstat pointers in that they point to a set of other structures. However, the set has a fixed size, its elements must be of the correct types, and must form a consistent set (in terms of their sizes and so on). You can refer to the elements of an SOM in exactly the same way as the elements of a pointers, but the suffixes and their labels are fixed. Unlike pointers, the labels are not case sensitive, so Genstat will recognize the label in either upper-case or lower-case letters or in any mixture of the two.
The elements of an SOM are as follows:
||text containing the names of the variables;|
||factor giving the row position of each node;|
||factor giving the column position of each node;|
||text containing either
||text containing either
||matrix of weights (variables × nodes);|
||pointer to store variates of summaries of variables at the modes of the map;|
||text indicating the method used to summarize the variable in each variate of summaries;|
||text indicating the variable that was summarized in each variate of summaries.|
SOM procedure defines the SOM, and forms its first five elements. The weights (element 6) can be estimated and stored in the SOM by the
SOMESTIMATE procedure, and the summary information (elements 7-9) can then be formed and added by the
SOMDESCRIBE procedure. Once this has been done, the
SOMPREDICT procedure can be used to generate predicted values of the summary variables for new or hypothetical observations.
The identifier for the SOM is specified by the
IDENTIFIER parameter. The names of variables corresponding to the weights are provided in a text specified by the
VARIABLENAMES parameter. The row and column positions of the nodes are specified by the
COLUMNS options. These can be set to scalars, specifying the numbers of rows and columns in a rectangular grid. The row and column coordinates are then positive integers starting at one. Alternatively, you can define your own row and column coordinates (which then need not be in a rectangular grid), by setting
COLUMNS to variates. By default,
ROWS is 5 and
COLUMNS is 6. The distance and weighting methods are specified by the
WMETHOD options, respectively.
For further information, see Hastie, Tibshirani & Friedman (2001) Section 14.4.
Hastie, T., Tibshirani, R, & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.
Commands for: Data mining.
CAPTION 'SOM example'; STYLE=meta SOM Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W) PRINT Som[1...5]