Declares a self-organizing map (R.W. Payne).
No options
Parameters
IDENTIFIER = identifiers |
Identifiers of the SOMs |
---|---|
VARIABLENAMES = texts |
Names of variables corresponding to the weights of each SOM |
ROWS = scalars or variates |
Number of rows or row coordinates for the map |
COLUMNS = scalars or variates |
Number of columns or column coordinates for the map |
DMETHOD = string tokens |
Method for calculating the distances of data points from the modes (euclidean , cityblock ); default eucl |
WMETHOD = string tokens |
Method for calculating the contribution of a data point to each node when revising the weights (gaussian , neighbour ); default gaus |
Description
A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable). SOM
defines the Genstat data structures used to represent self-organizing maps. These are compound data structures similar, for example, to the LRV structure used to store latent roots and vectors (see the LRV
directive). Compound data structures are like Genstat pointers in that they point to a set of other structures. However, the set has a fixed size, its elements must be of the correct types, and must form a consistent set (in terms of their sizes and so on). You can refer to the elements of an SOM in exactly the same way as the elements of a pointers, but the suffixes and their labels are fixed. Unlike pointers, the labels are not case sensitive, so Genstat will recognize the label in either upper-case or lower-case letters or in any mixture of the two.
The elements of an SOM are as follows:
[1] or ['variablenames'] |
text containing the names of the variables; |
---|---|
[2] or ['rows'] |
factor giving the row position of each node; |
[3] or ['columns'] |
factor giving the column position of each node; |
[4] or ['dmethod'] |
text containing either 'EUCLIDEAN' or 'CITYBLOCK' indicating the method used to measure distance on the map; |
[5] or ['wmethod'] |
text containing either 'GAUSSIAN' or 'NEIGHBOUR' indicating the method used to adjust the weights at each iteration during their estimation; |
[6] or ['weights'] |
matrix of weights (variables × nodes); |
[7] or ['summaries'] |
pointer to store variates of summaries of variables at the modes of the map; |
[8] or ['smethods'] |
text indicating the method used to summarize the variable in each variate of summaries; |
[9] or ['svariablenames'] |
text indicating the variable that was summarized in each variate of summaries. |
The SOM
procedure defines the SOM, and forms its first five elements. The weights (element 6) can be estimated and stored in the SOM by the SOMESTIMATE
procedure, and the summary information (elements 7-9) can then be formed and added by the SOMDESCRIBE
procedure. Once this has been done, the SOMPREDICT
procedure can be used to generate predicted values of the summary variables for new or hypothetical observations.
The identifier for the SOM is specified by the IDENTIFIER
parameter. The names of variables corresponding to the weights are provided in a text specified by the VARIABLENAMES
parameter. The row and column positions of the nodes are specified by the ROWS
and COLUMNS
options. These can be set to scalars, specifying the numbers of rows and columns in a rectangular grid. The row and column coordinates are then positive integers starting at one. Alternatively, you can define your own row and column coordinates (which then need not be in a rectangular grid), by setting ROWS
and COLUMNS
to variates. By default, ROWS
is 5 and COLUMNS
is 6. The distance and weighting methods are specified by the DMETHOD
and WMETHOD
options, respectively.
Options: none.
Parameters: IDENTIFIER
, VARIABLENAMES
, ROWS
, COLUMNS
, DMETHOD
, WMETHOD
.
Method
For further information, see Hastie, Tibshirani & Friedman (2001) Section 14.4.
Reference
Hastie, T., Tibshirani, R, & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York.
See also
Procedures: SOMADJUST
, SOMDESCRIBE
, SOMESTIMATE
, SOMIDENTIFY
, SOMPREDICT
.
Commands for: Data mining.
Example
CAPTION 'SOM example'; STYLE=meta SOM Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W) PRINT Som[1...5]