1. Home
  2. SOMPREDICT procedure

SOMPREDICT procedure

Makes predictions using a self-organizing map (R.W. Payne).

Options

PRINT = string token Controls whether or not the predictions are printed (predictions); default pred
SOM = pointer Specifies the map
YNAMES = text Names of variables to predict; default * gives predictions for all the variables
METHODS = string tokens Types of predictions to give (mean, mode, median, minimum, maximum, sd, variance); default mean, mode, medi, mini, maxi, sd, vari
YSAVE = text Saves a text with a unit for each set of predictions giving the name of the corresponding y-variable
MSAVE = text Saves a text with a unit for each set of predictions giving the name of the corresponding method

Parameters

DATA = matrices or pointers Data values to identify the positions of the new samples on the map
UNITLABELS = variates or texts Labels for the predictions (to identify the samples); default takes the row labels if DATA is a matrix or any unit labels if DATA is a pointer to a set of variates
PREDICTIONS = variates or pointers Save the predictions

Description

A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable); these can be estimated, from a training dataset, by procedure SOMESTIMATE. You can then use procedure SOMDESCRIBE to associate, with each node, predictions of various types for a set of variables (and these variables need not be amongst those used to form the map). This information can now be used by SOMPREDICT to supply predictions for some new or hypothetical samples.

The SOM option supplies the information about the self-organizing map, which will have been saved in a pointer using the NEWSOM parameter of SOMDESCRIBE. The DATA parameter supplies the variables required to identify the positions of the new samples on the map, either as a matrix with n rows and p columns (where n is the number of samples) or as a pointer containing p variates each with n units. The SOMIDENTIFY procedure, called by SOMPREDICT to identify the positions, will issue a warning if the variables have different names to those in the data set used by SOMESTIMATE to form the map. The YNAMES option supplies a text containing the names of the variables for which predictions are required. (These correspond to the identifiers of the variates and/or factors specified by the Y parameter of SOMDESCRIBE to form the predicted values currently associated with the map.) If YNAMES is not set, predictions will be given for all those variables. If more than one type of prediction was requested for a Y variable, using the METHOD parameter of SOMDESCRIBE, you can use the METHODS option of SOMPREDICT to specify a list of strings to indicate which ones you want. By default all are given.

The PREDICTIONS parameter can save the predictions formed for each matrix or pointer supplied by the DATA parameter. If the YNAMES and METHODS options have requested several sets of predictions (for different variables and/or using different methods), PREDICTIONS will save a pointer containing a variate for each set. Alternatively if only one set has been requested (i.e only one variable using only one method), PREDICTIONS will save a variate. To identify the variates within each pointer, the YSAVE option can save a text with a unit for each set of predictions, giving the name of the corresponding y-variable. Similarly, the MSAVE option can save a text whose units contain the names (in lower-case letters) of the corresponding methods. Each PREDICTIONS variate will have a unit for every sample. You can use the UNITLABELS parameter to supply a variate or text to label the units; otherwise SOMPREDICT uses the any row or unit labels defined on the matrix or variates supplied by the DATA parameter.

The PRINT option controls whether or not the predictions are printed (by default they will be printed).

Options: PRINT, SOM, YNAMES, METHODS, YSAVE, MSAVE.

Parameters: DATA, UNITLABELS, PREDICTIONS.

Method

The SOMIDENTIFY procedure is used to allocate the samples to the nodes of the map. The variates of predictions are then formed, from the information stored with the map, using ordinary Genstat declarations and calculations.

Action with RESTRICT

SOMPREDICT takes account of any restrictions defined on the variates in a DATA pointer.

See also

Procedures: SOM, SOMADJUST, SOMDESCRIBE, SOMESTIMATE, SOMIDENTIFY.

Commands for: Data mining.

Example

CAPTION 'SOMPREDICT example',!t('Fisher''s Iris Data'); STYLE=meta,plain
SOM     Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W)
MATRIX  [ROWS=150; COLUMNS=!t(Sepal_L,Sepal_W,Petal_L,Petal_W)] Measures
READ    Measures
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2
 4.6  3.1  1.5  0.2
 5.0  3.6  1.4  0.2
 5.4  3.9  1.7  0.4
 4.6  3.4  1.4  0.3
 5.0  3.4  1.5  0.2
 4.4  2.9  1.4  0.2
 4.9  3.1  1.5  0.1
 5.4  3.7  1.5  0.2
 4.8  3.4  1.6  0.2
 4.8  3.0  1.4  0.1
 4.3  3.0  1.1  0.1
 5.8  4.0  1.2  0.2
 5.7  4.4  1.5  0.4
 5.4  3.9  1.3  0.4
 5.1  3.5  1.4  0.3
 5.7  3.8  1.7  0.3
 5.1  3.8  1.5  0.3
 5.4  3.4  1.7  0.2
 5.1  3.7  1.5  0.4
 4.6  3.6  1.0  0.2
 5.1  3.3  1.7  0.5
 4.8  3.4  1.9  0.2
 5.0  3.0  1.6  0.2
 5.0  3.4  1.6  0.4
 5.2  3.5  1.5  0.2
 5.2  3.4  1.4  0.2
 4.7  3.2  1.6  0.2
 4.8  3.1  1.6  0.2
 5.4  3.4  1.5  0.4
 5.2  4.1  1.5  0.1
 5.5  4.2  1.4  0.2
 4.9  3.1  1.5  0.2
 5.0  3.2  1.2  0.2
 5.5  3.5  1.3  0.2
 4.9  3.6  1.4  0.1
 4.4  3.0  1.3  0.2
 5.1  3.4  1.5  0.2
 5.0  3.5  1.3  0.3
 4.5  2.3  1.3  0.3
 4.4  3.2  1.3  0.2
 5.0  3.5  1.6  0.6
 5.1  3.8  1.9  0.4
 4.8  3.0  1.4  0.3
 5.1  3.8  1.6  0.2
 4.6  3.2  1.4  0.2
 5.3  3.7  1.5  0.2
 5.0  3.3  1.4  0.2
 7.0  3.2  4.7  1.4
 6.4  3.2  4.5  1.5
 6.9  3.1  4.9  1.5
 5.5  2.3  4.0  1.3
 6.5  2.8  4.6  1.5
 5.7  2.8  4.5  1.3
 6.3  3.3  4.7  1.6
 4.9  2.4  3.3  1.0
 6.6  2.9  4.6  1.3
 5.2  2.7  3.9  1.4
 5.0  2.0  3.5  1.0
 5.9  3.0  4.2  1.5
 6.0  2.2  4.0  1.0
 6.1  2.9  4.7  1.4
 5.6  2.9  3.6  1.3
 6.7  3.1  4.4  1.4
 5.6  3.0  4.5  1.5
 5.8  2.7  4.1  1.0
 6.2  2.2  4.5  1.5
 5.6  2.5  3.9  1.1
 5.9  3.2  4.8  1.8
 6.1  2.8  4.0  1.3
 6.3  2.5  4.9  1.5
 6.1  2.8  4.7  1.2
 6.4  2.9  4.3  1.3
 6.6  3.0  4.4  1.4
 6.8  2.8  4.8  1.4
 6.7  3.0  5.0  1.7
 6.0  2.9  4.5  1.5
 5.7  2.6  3.5  1.0
 5.5  2.4  3.8  1.1
 5.5  2.4  3.7  1.0
 5.8  2.7  3.9  1.2
 6.0  2.7  5.1  1.6
 5.4  3.0  4.5  1.5
 6.0  3.4  4.5  1.6
 6.7  3.1  4.7  1.5
 6.3  2.3  4.4  1.3
 5.6  3.0  4.1  1.3
 5.5  2.5  4.0  1.3
 5.5  2.6  4.4  1.2
 6.1  3.0  4.6  1.4
 5.8  2.6  4.0  1.2
 5.0  2.3  3.3  1.0
 5.6  2.7  4.2  1.3
 5.7  3.0  4.2  1.2
 5.7  2.9  4.2  1.3
 6.2  2.9  4.3  1.3
 5.1  2.5  3.0  1.1
 5.7  2.8  4.1  1.3
 6.3  3.3  6.0  2.5
 5.8  2.7  5.1  1.9
 7.1  3.0  5.9  2.1
 6.3  2.9  5.6  1.8
 6.5  3.0  5.8  2.2
 7.6  3.0  6.6  2.1
 4.9  2.5  4.5  1.7
 7.3  2.9  6.3  1.8
 6.7  2.5  5.8  1.8
 7.2  3.6  6.1  2.5
 6.5  3.2  5.1  2.0
 6.4  2.7  5.3  1.9
 6.8  3.0  5.5  2.1
 5.7  2.5  5.0  2.0
 5.8  2.8  5.1  2.4
 6.4  3.2  5.3  2.3
 6.5  3.0  5.5  1.8
 7.7  3.8  6.7  2.2
 7.7  2.6  6.9  2.3
 6.0  2.2  5.0  1.5
 6.9  3.2  5.7  2.3
 5.6  2.8  4.9  2.0
 7.7  2.8  6.7  2.0
 6.3  2.7  4.9  1.8
 6.7  3.3  5.7  2.1
 7.2  3.2  6.0  1.8
 6.2  2.8  4.8  1.8
 6.1  3.0  4.9  1.8
 6.4  2.8  5.6  2.1
 7.2  3.0  5.8  1.6
 7.4  2.8  6.1  1.9
 7.9  3.8  6.4  2.0
 6.4  2.8  5.6  2.2
 6.3  2.8  5.1  1.5
 6.1  2.6  5.6  1.4
 7.7  3.0  6.1  2.3
 6.3  3.4  5.6  2.4
 6.4  3.1  5.5  1.8
 6.0  3.0  4.8  1.8
 6.9  3.1  5.4  2.1
 6.7  3.1  5.6  2.4
 6.9  3.1  5.1  2.3
 5.8  2.7  5.1  1.9
 6.8  3.2  5.9  2.3
 6.7  3.3  5.7  2.5
 6.7  3.0  5.2  2.3
 6.3  2.5  5.0  1.9
 6.5  3.0  5.2  2.0
 6.2  3.4  5.4  2.3
 5.9  3.0  5.1  1.8  :
FACTOR       [NVALUES=150; LABELS=!t(Setosa,Versicolor,Virginica);\ 
             VALUES=50(1,2,3)] Species
SOMESTIMATE  [PRINT=weights,report; PLOT=*; NCYCLE=!(100,200);\
             SIGMA=!(5,1)] Som; DATA=Measures; SEED=419749
SOMDESCRIBE  [DATA=Measures; SOM=Som; NEWSOM=SomD] Species
VARIATE      [NVALUES=6] Sepal_L,Sepal_W,Petal_L,Petal_W
READ         Sepal_L,Sepal_W,Petal_L,Petal_W
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 7.0  3.2  4.7  1.4
 6.4  3.2  4.5  1.5
 6.3  3.3  6.0  2.5
 5.8  2.7  5.1  1.9 :
SOMPREDICT   [SOM=SomD] !p(Sepal_L,Sepal_W,Petal_L,Petal_W)
Updated on March 5, 2019

Was this article helpful?