Performs adjustments to the weights of a self-organizing map (R.W. Payne).
Options
SOM = pointer |
Self-organizing map |
---|---|
DATA = matrix or pointer |
Data values for training the map |
DMETHOD = string token |
Method for calculating the distances of data points from the modes (euclidean , cityblock ); default eucl |
WMETHOD = string token |
Method for calculating the contribution of a data point to each node when revising the weights (gaussian , neighbour ); default gaus |
Parameters
ALPHA = scalars |
Alpha value for each iteration |
---|---|
SIGMA = scalars |
Sigma value for each iteration when WMETHOD=gaussian |
THRESHOLD = scalars |
Threshold for each iteration when WMETHOD=neighbour |
ERRORS = matrices |
Saves the reconstruction errors at the nodes of the map after each iteration |
TOTALERROR = scalars |
Saves the total reconstruction error after each iteration |
FITNODES = factors |
Saves the nodes allocated to the data points after each iteration |
Description
A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable). Genstat has a special SOM data structure to represent a map. This is declared using the SOM
procedure, which also defines the row and column positions of the nodes on the grid. In addition, SOM
stores the names of the weight variables, and information about how distances are to be measured on the grid and how the weights should be adjusted during their estimation.
The training dataset to estimate the weights is specified by the DATA
option, either as a matrix with n rows and p columns (where n is the number of observations in the training set) or as a pointer containing p variates each with n units. SOMADJUST
gives a warning if the row names of a DATA
matrix or the names of the variates in a DATA
pointer differ from the names stored for the weight variables in the SOM structure.
The weights are estimated by a sequence of iterations. In each iteraction, the training observations are taken in turn. Each observation i is assessed to find its closest node. The method to use to measure distance on the map will have been specified, by the DMETHOD
option of SOM
, and stored with the SOM structure when it was declared. However, SOMADJUST
also has a DMETHOD
option in case you want to override the stored setting. The default setting for the DMETHOD
option of SOM
is euclidean
. If X_i
is a variate containing the values of the variables for observation i and W_k
is the variate of weights at node j, the distance is then given by
d_ij = SQRT(SUM((X_i - W_j)**2))
The alternative setting, cityblock
, calculates the distance as
d_ij = SUM(ABS(X_i - W_j)))
Once the closest node, k, has been found, the weights at that node and other nodes are adjusted. The method to use will have been specified when the SOM structure was declared, by the WMETHOD
option of SOM
. However, SOMADJUST
again has its own WMETHOD
option, that you can use to override the stored setting. The default setting for the DMETHOD
option of SOM
is gaussian
. This adjusts the weights W_j
at every node j to become
W_j + alpha * EXP( -0.5 * (d_jk / sigma)**2) * (X_i - W_j)
where d_jk
is the distance between nodes j and k. With the alternative setting, neighbour
, the weights at node j are adjusted to become
W_j + alpha * (X_i - W_j)
but only if d_jk
is less than a threshold r
.
The values of alpha
, sigma
and r
for the iterations are listed by the ALPHA
, SIGMA
and THRESHOLD
parameters of SOMADJUST
. Each of these supplies a list of scalars (one for each iteration). The ERRORS
parameter can save a list of matrices containing reconstruction error at the nodes of the map after each iteration. The TOTALERROR
parameter can save a list of scalars with the total reconstruction error after each iteration. Finally, the FITNODES
parameter can save a list of factors indicating how the observations are allocated to the nodes by each iteration.
SOMADJUST
thus allows you define your own sequence of adjustment iteractions leading to the estimation of the weights. An alternative is to use procedure SOMESTIMATE
, which initializes the weights and runs through an automatic sequence of iterations (each performed using SOMADJUST
).
Options: SOM
, DATA
, DMETHOD
, WMETHOD
.
Parameters: ALPHA
, SIGMA
, THRESHOLD
, ERRORS
, TOTALERROR
, FITNODES
.
Action with RESTRICT
SOMADJUST
takes account of any restrictions defined on the DATA
variates.
See also
Procedures: SOM
, SOMDESCRIBE
, SOMESTIMATE
, SOMIDENTIFY
, SOMPREDICT
.
Commands for: Data mining.
Example
CAPTION 'SOMADJUST example',!t('Fisher''s Iris Data'); STYLE=meta,plain SOM Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W) MATRIX [ROWS=150; COLUMNS=!t(Sepal_L,Sepal_W,Petal_L,Petal_W)] Measures READ Measures 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 5.0 3.6 1.4 0.2 5.4 3.9 1.7 0.4 4.6 3.4 1.4 0.3 5.0 3.4 1.5 0.2 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1 5.4 3.7 1.5 0.2 4.8 3.4 1.6 0.2 4.8 3.0 1.4 0.1 4.3 3.0 1.1 0.1 5.8 4.0 1.2 0.2 5.7 4.4 1.5 0.4 5.4 3.9 1.3 0.4 5.1 3.5 1.4 0.3 5.7 3.8 1.7 0.3 5.1 3.8 1.5 0.3 5.4 3.4 1.7 0.2 5.1 3.7 1.5 0.4 4.6 3.6 1.0 0.2 5.1 3.3 1.7 0.5 4.8 3.4 1.9 0.2 5.0 3.0 1.6 0.2 5.0 3.4 1.6 0.4 5.2 3.5 1.5 0.2 5.2 3.4 1.4 0.2 4.7 3.2 1.6 0.2 4.8 3.1 1.6 0.2 5.4 3.4 1.5 0.4 5.2 4.1 1.5 0.1 5.5 4.2 1.4 0.2 4.9 3.1 1.5 0.2 5.0 3.2 1.2 0.2 5.5 3.5 1.3 0.2 4.9 3.6 1.4 0.1 4.4 3.0 1.3 0.2 5.1 3.4 1.5 0.2 5.0 3.5 1.3 0.3 4.5 2.3 1.3 0.3 4.4 3.2 1.3 0.2 5.0 3.5 1.6 0.6 5.1 3.8 1.9 0.4 4.8 3.0 1.4 0.3 5.1 3.8 1.6 0.2 4.6 3.2 1.4 0.2 5.3 3.7 1.5 0.2 5.0 3.3 1.4 0.2 7.0 3.2 4.7 1.4 6.4 3.2 4.5 1.5 6.9 3.1 4.9 1.5 5.5 2.3 4.0 1.3 6.5 2.8 4.6 1.5 5.7 2.8 4.5 1.3 6.3 3.3 4.7 1.6 4.9 2.4 3.3 1.0 6.6 2.9 4.6 1.3 5.2 2.7 3.9 1.4 5.0 2.0 3.5 1.0 5.9 3.0 4.2 1.5 6.0 2.2 4.0 1.0 6.1 2.9 4.7 1.4 5.6 2.9 3.6 1.3 6.7 3.1 4.4 1.4 5.6 3.0 4.5 1.5 5.8 2.7 4.1 1.0 6.2 2.2 4.5 1.5 5.6 2.5 3.9 1.1 5.9 3.2 4.8 1.8 6.1 2.8 4.0 1.3 6.3 2.5 4.9 1.5 6.1 2.8 4.7 1.2 6.4 2.9 4.3 1.3 6.6 3.0 4.4 1.4 6.8 2.8 4.8 1.4 6.7 3.0 5.0 1.7 6.0 2.9 4.5 1.5 5.7 2.6 3.5 1.0 5.5 2.4 3.8 1.1 5.5 2.4 3.7 1.0 5.8 2.7 3.9 1.2 6.0 2.7 5.1 1.6 5.4 3.0 4.5 1.5 6.0 3.4 4.5 1.6 6.7 3.1 4.7 1.5 6.3 2.3 4.4 1.3 5.6 3.0 4.1 1.3 5.5 2.5 4.0 1.3 5.5 2.6 4.4 1.2 6.1 3.0 4.6 1.4 5.8 2.6 4.0 1.2 5.0 2.3 3.3 1.0 5.6 2.7 4.2 1.3 5.7 3.0 4.2 1.2 5.7 2.9 4.2 1.3 6.2 2.9 4.3 1.3 5.1 2.5 3.0 1.1 5.7 2.8 4.1 1.3 6.3 3.3 6.0 2.5 5.8 2.7 5.1 1.9 7.1 3.0 5.9 2.1 6.3 2.9 5.6 1.8 6.5 3.0 5.8 2.2 7.6 3.0 6.6 2.1 4.9 2.5 4.5 1.7 7.3 2.9 6.3 1.8 6.7 2.5 5.8 1.8 7.2 3.6 6.1 2.5 6.5 3.2 5.1 2.0 6.4 2.7 5.3 1.9 6.8 3.0 5.5 2.1 5.7 2.5 5.0 2.0 5.8 2.8 5.1 2.4 6.4 3.2 5.3 2.3 6.5 3.0 5.5 1.8 7.7 3.8 6.7 2.2 7.7 2.6 6.9 2.3 6.0 2.2 5.0 1.5 6.9 3.2 5.7 2.3 5.6 2.8 4.9 2.0 7.7 2.8 6.7 2.0 6.3 2.7 4.9 1.8 6.7 3.3 5.7 2.1 7.2 3.2 6.0 1.8 6.2 2.8 4.8 1.8 6.1 3.0 4.9 1.8 6.4 2.8 5.6 2.1 7.2 3.0 5.8 1.6 7.4 2.8 6.1 1.9 7.9 3.8 6.4 2.0 6.4 2.8 5.6 2.2 6.3 2.8 5.1 1.5 6.1 2.6 5.6 1.4 7.7 3.0 6.1 2.3 6.3 3.4 5.6 2.4 6.4 3.1 5.5 1.8 6.0 3.0 4.8 1.8 6.9 3.1 5.4 2.1 6.7 3.1 5.6 2.4 6.9 3.1 5.1 2.3 5.8 2.7 5.1 1.9 6.8 3.2 5.9 2.3 6.7 3.3 5.7 2.5 6.7 3.0 5.2 2.3 6.3 2.5 5.0 1.9 6.5 3.0 5.2 2.0 6.2 3.4 5.4 2.3 5.9 3.0 5.1 1.8 : FACTOR [NVALUES=150; LABELS=!t(Setosa,Versicolor,Virginica);\ VALUES=50(1,2,3)] Species CALCULATE [SEED=187123] Random = GRUNIFORM(NVALUES(Som['weights']);\ MINIMUM(Measures); MAXIMUM(Measures)) EQUATE Random; Som['weights'] SOMADJUST [SOM=Som; DATA=Measures] 1,0.99...0.01; SIGMA=5,4.95...0.05;\ TOTALERROR=Errors[1...100] PRINT Som['weights'] VARIATE [VALUES=1...100] Iteration,Totalerror EQUATE Errors; Totalerror PEN 11; METHOD=line; SYMBOL=0 DGRAPH Totalerror; Iteration; PEN=11