SOM procedure

Declares a self-organizing map (R.W. Payne).

No options

Parameters

`IDENTIFIER` = identifiers	Identifiers of the SOMs
`VARIABLENAMES` = texts	Names of variables corresponding to the weights of each SOM
`ROWS` = scalars or variates	Number of rows or row coordinates for the map
`COLUMNS` = scalars or variates	Number of columns or column coordinates for the map
`DMETHOD` = string tokens	Method for calculating the distances of data points from the modes (`euclidean`, `cityblock`); default `eucl`
`WMETHOD` = string tokens	Method for calculating the contribution of a data point to each node when revising the weights (`gaussian`, `neighbour`); default `gaus`

Description

A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable). SOM defines the Genstat data structures used to represent self-organizing maps. These are compound data structures similar, for example, to the LRV structure used to store latent roots and vectors (see the LRV directive). Compound data structures are like Genstat pointers in that they point to a set of other structures. However, the set has a fixed size, its elements must be of the correct types, and must form a consistent set (in terms of their sizes and so on). You can refer to the elements of an SOM in exactly the same way as the elements of a pointers, but the suffixes and their labels are fixed. Unlike pointers, the labels are not case sensitive, so Genstat will recognize the label in either upper-case or lower-case letters or in any mixture of the two.

The elements of an SOM are as follows:

`[1]` or `['variablenames']`	text containing the names of the variables;
`[2]` or `['rows']`	factor giving the row position of each node;
`[3]` or `['columns']`	factor giving the column position of each node;
`[4]` or `['dmethod']`	text containing either `'EUCLIDEAN'` or `'CITYBLOCK'` indicating the method used to measure distance on the map;
`[5]` or `['wmethod']`	text containing either `'GAUSSIAN'` or `'NEIGHBOUR'` indicating the method used to adjust the weights at each iteration during their estimation;
`[6]` or `['weights']`	matrix of weights (variables × nodes);
`[7]` or `['summaries']`	pointer to store variates of summaries of variables at the modes of the map;
`[8]` or `['smethods']`	text indicating the method used to summarize the variable in each variate of summaries;
`[9]` or `['svariablenames']`	text indicating the variable that was summarized in each variate of summaries.

The SOM procedure defines the SOM, and forms its first five elements. The weights (element 6) can be estimated and stored in the SOM by the SOMESTIMATE procedure, and the summary information (elements 7-9) can then be formed and added by the SOMDESCRIBE procedure. Once this has been done, the SOMPREDICT procedure can be used to generate predicted values of the summary variables for new or hypothetical observations.

The identifier for the SOM is specified by the IDENTIFIER parameter. The names of variables corresponding to the weights are provided in a text specified by the VARIABLENAMES parameter. The row and column positions of the nodes are specified by the ROWS and COLUMNS options. These can be set to scalars, specifying the numbers of rows and columns in a rectangular grid. The row and column coordinates are then positive integers starting at one. Alternatively, you can define your own row and column coordinates (which then need not be in a rectangular grid), by setting ROWS and COLUMNS to variates. By default, ROWS is 5 and COLUMNS is 6. The distance and weighting methods are specified by the DMETHOD and WMETHOD options, respectively.

Options: none.

Parameters: IDENTIFIER, VARIABLENAMES, ROWS, COLUMNS, DMETHOD, WMETHOD.

Method

For further information, see Hastie, Tibshirani & Friedman (2001) Section 14.4.

Reference

Hastie, T., Tibshirani, R, & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.

Example

CAPTION 'SOM example'; STYLE=meta
SOM     Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W)
PRINT   Som[1...5]

Updated on March 5, 2019

Was this article helpful?

Yes No