Makes predictions using a self-organizing map (R.W. Payne).
Options
PRINT = string token |
Controls whether or not the predictions are printed (predictions ); default pred |
---|---|
SOM = pointer |
Specifies the map |
YNAMES = text |
Names of variables to predict; default * gives predictions for all the variables |
METHODS = string tokens |
Types of predictions to give (mean , mode , median , minimum , maximum , sd , variance ); default mean , mode , medi , mini , maxi , sd , vari |
YSAVE = text |
Saves a text with a unit for each set of predictions giving the name of the corresponding y-variable |
MSAVE = text |
Saves a text with a unit for each set of predictions giving the name of the corresponding method |
Parameters
DATA = matrices or pointers | Data values to identify the positions of the new samples on the map |
---|---|
UNITLABELS = variates or texts |
Labels for the predictions (to identify the samples); default takes the row labels if DATA is a matrix or any unit labels if DATA is a pointer to a set of variates |
PREDICTIONS = variates or pointers |
Save the predictions |
Description
A self-organizing map is a two dimensional grid of nodes, used to classify vectors of observations on p variables. Each node is characterized by a vector of p weights (one for each variable); these can be estimated, from a training dataset, by procedure SOMESTIMATE
. You can then use procedure SOMDESCRIBE
to associate, with each node, predictions of various types for a set of variables (and these variables need not be amongst those used to form the map). This information can now be used by SOMPREDICT
to supply predictions for some new or hypothetical samples.
The SOM
option supplies the information about the self-organizing map, which will have been saved in a pointer using the NEWSOM
parameter of SOMDESCRIBE
. The DATA
parameter supplies the variables required to identify the positions of the new samples on the map, either as a matrix with n rows and p columns (where n is the number of samples) or as a pointer containing p variates each with n units. The SOMIDENTIFY
procedure, called by SOMPREDICT
to identify the positions, will issue a warning if the variables have different names to those in the data set used by SOMESTIMATE
to form the map. The YNAMES
option supplies a text containing the names of the variables for which predictions are required. (These correspond to the identifiers of the variates and/or factors specified by the Y
parameter of SOMDESCRIBE
to form the predicted values currently associated with the map.) If YNAMES
is not set, predictions will be given for all those variables. If more than one type of prediction was requested for a Y
variable, using the METHOD
parameter of SOMDESCRIBE
, you can use the METHODS
option of SOMPREDICT
to specify a list of strings to indicate which ones you want. By default all are given.
The PREDICTIONS
parameter can save the predictions formed for each matrix or pointer supplied by the DATA
parameter. If the YNAMES
and METHODS
options have requested several sets of predictions (for different variables and/or using different methods), PREDICTIONS
will save a pointer containing a variate for each set. Alternatively if only one set has been requested (i.e only one variable using only one method), PREDICTIONS
will save a variate. To identify the variates within each pointer, the YSAVE
option can save a text with a unit for each set of predictions, giving the name of the corresponding y-variable. Similarly, the MSAVE
option can save a text whose units contain the names (in lower-case letters) of the corresponding methods. Each PREDICTIONS
variate will have a unit for every sample. You can use the UNITLABELS
parameter to supply a variate or text to label the units; otherwise SOMPREDICT
uses the any row or unit labels defined on the matrix or variates supplied by the DATA
parameter.
The PRINT
option controls whether or not the predictions are printed (by default they will be printed).
Options: PRINT
, SOM
, YNAMES
, METHODS
, YSAVE
, MSAVE
.
Parameters: DATA
, UNITLABELS
, PREDICTIONS
.
Method
The SOMIDENTIFY
procedure is used to allocate the samples to the nodes of the map. The variates of predictions are then formed, from the information stored with the map, using ordinary Genstat declarations and calculations.
Action with RESTRICT
SOMPREDICT
takes account of any restrictions defined on the variates in a DATA
pointer.
See also
Procedures: SOM
, SOMADJUST
, SOMDESCRIBE
, SOMESTIMATE
, SOMIDENTIFY
.
Commands for: Data mining.
Example
CAPTION 'SOMPREDICT example',!t('Fisher''s Iris Data'); STYLE=meta,plain SOM Som; VARIABLENAMES=!t(Sepal_L,Sepal_W,Petal_L,Petal_W) MATRIX [ROWS=150; COLUMNS=!t(Sepal_L,Sepal_W,Petal_L,Petal_W)] Measures READ Measures 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 5.0 3.6 1.4 0.2 5.4 3.9 1.7 0.4 4.6 3.4 1.4 0.3 5.0 3.4 1.5 0.2 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1 5.4 3.7 1.5 0.2 4.8 3.4 1.6 0.2 4.8 3.0 1.4 0.1 4.3 3.0 1.1 0.1 5.8 4.0 1.2 0.2 5.7 4.4 1.5 0.4 5.4 3.9 1.3 0.4 5.1 3.5 1.4 0.3 5.7 3.8 1.7 0.3 5.1 3.8 1.5 0.3 5.4 3.4 1.7 0.2 5.1 3.7 1.5 0.4 4.6 3.6 1.0 0.2 5.1 3.3 1.7 0.5 4.8 3.4 1.9 0.2 5.0 3.0 1.6 0.2 5.0 3.4 1.6 0.4 5.2 3.5 1.5 0.2 5.2 3.4 1.4 0.2 4.7 3.2 1.6 0.2 4.8 3.1 1.6 0.2 5.4 3.4 1.5 0.4 5.2 4.1 1.5 0.1 5.5 4.2 1.4 0.2 4.9 3.1 1.5 0.2 5.0 3.2 1.2 0.2 5.5 3.5 1.3 0.2 4.9 3.6 1.4 0.1 4.4 3.0 1.3 0.2 5.1 3.4 1.5 0.2 5.0 3.5 1.3 0.3 4.5 2.3 1.3 0.3 4.4 3.2 1.3 0.2 5.0 3.5 1.6 0.6 5.1 3.8 1.9 0.4 4.8 3.0 1.4 0.3 5.1 3.8 1.6 0.2 4.6 3.2 1.4 0.2 5.3 3.7 1.5 0.2 5.0 3.3 1.4 0.2 7.0 3.2 4.7 1.4 6.4 3.2 4.5 1.5 6.9 3.1 4.9 1.5 5.5 2.3 4.0 1.3 6.5 2.8 4.6 1.5 5.7 2.8 4.5 1.3 6.3 3.3 4.7 1.6 4.9 2.4 3.3 1.0 6.6 2.9 4.6 1.3 5.2 2.7 3.9 1.4 5.0 2.0 3.5 1.0 5.9 3.0 4.2 1.5 6.0 2.2 4.0 1.0 6.1 2.9 4.7 1.4 5.6 2.9 3.6 1.3 6.7 3.1 4.4 1.4 5.6 3.0 4.5 1.5 5.8 2.7 4.1 1.0 6.2 2.2 4.5 1.5 5.6 2.5 3.9 1.1 5.9 3.2 4.8 1.8 6.1 2.8 4.0 1.3 6.3 2.5 4.9 1.5 6.1 2.8 4.7 1.2 6.4 2.9 4.3 1.3 6.6 3.0 4.4 1.4 6.8 2.8 4.8 1.4 6.7 3.0 5.0 1.7 6.0 2.9 4.5 1.5 5.7 2.6 3.5 1.0 5.5 2.4 3.8 1.1 5.5 2.4 3.7 1.0 5.8 2.7 3.9 1.2 6.0 2.7 5.1 1.6 5.4 3.0 4.5 1.5 6.0 3.4 4.5 1.6 6.7 3.1 4.7 1.5 6.3 2.3 4.4 1.3 5.6 3.0 4.1 1.3 5.5 2.5 4.0 1.3 5.5 2.6 4.4 1.2 6.1 3.0 4.6 1.4 5.8 2.6 4.0 1.2 5.0 2.3 3.3 1.0 5.6 2.7 4.2 1.3 5.7 3.0 4.2 1.2 5.7 2.9 4.2 1.3 6.2 2.9 4.3 1.3 5.1 2.5 3.0 1.1 5.7 2.8 4.1 1.3 6.3 3.3 6.0 2.5 5.8 2.7 5.1 1.9 7.1 3.0 5.9 2.1 6.3 2.9 5.6 1.8 6.5 3.0 5.8 2.2 7.6 3.0 6.6 2.1 4.9 2.5 4.5 1.7 7.3 2.9 6.3 1.8 6.7 2.5 5.8 1.8 7.2 3.6 6.1 2.5 6.5 3.2 5.1 2.0 6.4 2.7 5.3 1.9 6.8 3.0 5.5 2.1 5.7 2.5 5.0 2.0 5.8 2.8 5.1 2.4 6.4 3.2 5.3 2.3 6.5 3.0 5.5 1.8 7.7 3.8 6.7 2.2 7.7 2.6 6.9 2.3 6.0 2.2 5.0 1.5 6.9 3.2 5.7 2.3 5.6 2.8 4.9 2.0 7.7 2.8 6.7 2.0 6.3 2.7 4.9 1.8 6.7 3.3 5.7 2.1 7.2 3.2 6.0 1.8 6.2 2.8 4.8 1.8 6.1 3.0 4.9 1.8 6.4 2.8 5.6 2.1 7.2 3.0 5.8 1.6 7.4 2.8 6.1 1.9 7.9 3.8 6.4 2.0 6.4 2.8 5.6 2.2 6.3 2.8 5.1 1.5 6.1 2.6 5.6 1.4 7.7 3.0 6.1 2.3 6.3 3.4 5.6 2.4 6.4 3.1 5.5 1.8 6.0 3.0 4.8 1.8 6.9 3.1 5.4 2.1 6.7 3.1 5.6 2.4 6.9 3.1 5.1 2.3 5.8 2.7 5.1 1.9 6.8 3.2 5.9 2.3 6.7 3.3 5.7 2.5 6.7 3.0 5.2 2.3 6.3 2.5 5.0 1.9 6.5 3.0 5.2 2.0 6.2 3.4 5.4 2.3 5.9 3.0 5.1 1.8 : FACTOR [NVALUES=150; LABELS=!t(Setosa,Versicolor,Virginica);\ VALUES=50(1,2,3)] Species SOMESTIMATE [PRINT=weights,report; PLOT=*; NCYCLE=!(100,200);\ SIGMA=!(5,1)] Som; DATA=Measures; SEED=419749 SOMDESCRIBE [DATA=Measures; SOM=Som; NEWSOM=SomD] Species VARIATE [NVALUES=6] Sepal_L,Sepal_W,Petal_L,Petal_W READ Sepal_L,Sepal_W,Petal_L,Petal_W 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 7.0 3.2 4.7 1.4 6.4 3.2 4.5 1.5 6.3 3.3 6.0 2.5 5.8 2.7 5.1 1.9 : SOMPREDICT [SOM=SomD] !p(Sepal_L,Sepal_W,Petal_L,Petal_W)