Fits a support vector machine (D. B. Baird).

### Options

`PRINT` = string tokens |
Printed output from the analysis (`summary` , `predictions` , `allocations` , `debug` ); default `summ` , `alloc` |
---|---|

`SVMTYPE` = string token |
Type of support vector machine to fit (`svc` , `svr` , `nusvc` , `nusvr` , `lsvc` , `lsvr` , `lcs` , `svm1` ); default `svc` |

`KERNEL` = string token |
Type of kernel to use (`linear` , `polynomial` , `radialbasis` , `sigmoid` ); default `radi` |

`PENALTY` = scalar or variate |
Penalty or cost for points on the wrong side of the boundary; default 1 |

`GAMMA` = scalar or variate |
Gamma parameter for types with non-linear kernels; default 1 |

`NU` = scalar or variate |
Nu parameter for types `nusvc` , `nusvr` , and `svm1` ; default 0.5 |

`EPSILON` = scalar or variate |
Epsilon parameter for types `svr` and `lsvr` ; default 0.1 |

`BIAS` = scalar |
Bias for allocations to groups for types `lsvc` and `lsvr` ; default -1 i.e. no bias |

`DEGREE` = scalar |
Degree for polynomial kernel; default 3 |

`CONSTANTVALUE` = scalar |
Constant for polynomial or sigmoid kernel; default 0 |

`LOWER` = scalar or variate |
Lower limit for scaling data variates; default -1 |

`UPPER` = scalar or variate |
Upper limit for scaling data variates; default 1 |

`SCALING` = string token |
Type of scaling to use (`none` , `uniform` , `given` ); default `unif` |

`NOSHRINK` =string token |
Whether to suppress the shrinkage of attributes to exclude unused ones (`no` , `yes` ); default `no` |

`OPTMETHOD` =string token |
Whether to optimize probabilities or allocations (`allocations` , `probabilities` ); default `allo` |

`REGULARIZATIONMETHOD` = string token |
Regularization method for `SVMTYPE` `=` `lsvc` or `lsvr` (`l1` , `l2` ); default `l2` |

`LOSSMETHOD` = string token |
Loss method for `SVMTYPE` `=` `lsvc` or `lsvr` (`logistic` , `l1` , `l2` ); default `logi` |

`DUALMETHOD` = string token |
Whether to use the dual algorithm for `SVMTYPE` `=` `lsvc` or `lsvr` (`yes` , `no` ); default `no` |

`NCROSSVALIDATIONGROUPS` = scalar |
Number of groups for cross-validation; default 10 |

`SEED` = scalar |
Seed for random number generation; default 0 |

`TOLERANCE` = scalar |
Tolerance for termination criterion; default 0.001 |

`WORKSPACE` = scalar |
Size of workspace needed for data; default is to calculate this from the number of observations and variates |

### Parameters

`Y` = factors or variates |
Define groupings for the units in each training set y-variate to be predicted via regression, with missing values in the units to be allocated or predicted |
---|---|

`X` = pointers |
Each pointer contains a set of explanatory variates or factors |

`WEIGHTS` = variates |
Weights to multiply penalties for each group when `SVMTYPE` `=` `svc` , `nusvc` , `lsvc` or `lcs` |

`PREDICTIONS` = factors or variates |
Saves allocations to groups or predictions from regression |

`ERRORRATE` = scalars, variates or matrices |
Saves the error rate for the combinations of parameters specified for the support vector machine |

`OPTPENALTY` = scalars |
Saves the optimal value of penalty parameter |

`OPTGAMMA` = scalars |
Saves the optimal value of gamma parameter |

`OPTNU` = scalars |
Saves the optimal value of nu parameter |

`OPTEPSILON` = scalars |
Saves the optimal value of epsilon parameter |

`OPTERRORRATE` = scalars |
Saves the minimum error rate |

`SCALE` = texts or pointers |
Saves the scaling used for the `X` variates, in a file if a text is given, or otherwise in a pointer to a pair of variates |

`SAVEFILE` = texts |
File in which to save the model, for use by `SVMPREDICT` |

### Description

`SVMFIT`

fits a support vector machine (Cortes & Vapnik 1995), which defines multivariate boundaries to separate groups, or predict values. It provides a Genstat interface to the libraries LIBSVM (Chang & Lin 2001) and LIBLINEAR (Fan *et al.* 2008), which are made available subject to the conditions listed in the Method section.

Unlike linear discriminant analysis, a support vector machine assumes no statistical model for the distribution of individuals within a group. The method is thus less affected by outliers. The method chooses boundaries to maximize the separation between groups. The reason why this is known as a support vector machine, is that there is a small set of data points that define the boundaries, and these are known as the support vectors. If individuals lie on the wrong side of the boundary, the distance from the boundary, multiplied by a penalty, is added to the separation criterion.

The type of support vector machine to fit is specified by the `SVMTYPE`

option, with settings:

`svc` |
a multi-class support vector classifier with a range of kernels for discriminating between groups; |
---|---|

`svr` |
support vector regression with a range of kernels for predicting the values of a y-variate as in a regression; |

`nusvc` |
Nu classification – a multi-class support vector classifier with a range of kernels for discriminating between groups with a parameter `NU` that controls the fraction of support vectors used; |

`nusvr` |
Nu regression – support vector regression with a range of kernels for predicting the values of a y-variate as in a regression with a parameter `NU` that controls the fraction of support vectors used; |

`lsvc` |
Fast linear classification – a fast regularized linear support vector for discriminating between groups; |

`lsvr` |
Fast linear regression – a fast regularized linear support vector regression for predicting the values of a y-variate as in a regression; |

`lcs` |
a fast linear support vector machine for discriminating between groups using the approach of Cramer & Singer (2000), where a direct method for training multi-class predictors is used, rather than dividing the multi-class classification into a set of binary classifications; and |

`svm1` |
Consistent group SVM – a support vector machine which attempts to identify a consistent group of observations. |

The shape of the boundary is controlled by the `KERNEL`

option which specifies the metric used to measure distance between multi-dimensional points *u* and *v*. The settings are:

`linear` |
the linear function u′v; |
---|---|

`polynomial` |
the polynomial function γ (u′v + c)^{d}; |

`radialbasis` |
the radial basis function exp(-γ |u – v|^{2}); and |

`sigmoid` |
the sigmoid function tanh(γ u′v + c). |

With a linear kernel, the boundaries are multi-dimensional planes. For the other types they are curved surfaces. The kernel is ignored for `SVMTYPE=lsvc`

, `lsvr`

and `lcs`

as these always use a linear kernel.

The data set is supplied in a pointer of explanatory variates or factors, specified by the `X`

parameter, and a response variate or factor specified by the `Y`

parameter. The `Y`

parameter need not be set if `SVMTYPE=svm1`

, as this searches for a consistent group of individuals in the data set, ignoring the `Y`

parameter. Explanatory factors are converted to variates, using the levels of the factor concerned. Any unit with a missing value in an explanatory variate takes a zero value for that attribute. With the default, uniform, scaling this puts them in the centre of the range of the variate concerned. Units can also be excluded from the analysis by restricting the factor or variates; any such restrictions must be consistent.

The response factor specifies the pre-defined groupings of the units from which the allocation is derived (the “training set”); the units to be allocated by the analysis have missing values for `Y`

. A response variate supplies training values for a regression-type support vector machine. (These are requested by `SVMTYPE`

settings `svr`

, `nusvr`

and `lsvr`

.) Units to be predicted by the regression have missing values in the y-variate.

The support vector machine solutions depend on the scale of the attributes. It is usually recommended that all attributes are put on the same scale, so that they all have the same influence. This is controlled by the `SCALING`

option, with settings:

`none` |
the attributes are used as supplied, with no scaling; |
---|---|

`uniform` |
all the attributes are centred, and scaled to have the same minimum and maximum (default); and |

`given` |
the variates are scaled using the `LOWER` and `UPPER` options. |

The `LOWER`

and `UPPER`

options can be set to a scalar, to apply a uniform scaling, where all the variates are given the same minimum (`LOWER`

) and maximum (`UPPER`

) value; alternatively, they can be variates specifying the minimum and maximum value for each variate, respectively.

The `PENALTY`

option defines the penalty that is applied to the sum of distances for the points on the wrong side of the boundary when calculating the optimal boundaries; default 1. Larger values apply more weight to points that are on the wrong side of the discrimination boundaries, and can be investigated to optimize performance. However, linear support vector machines are generally insensitive to the choice of the penalty. The `WEIGHTS`

parameter can be used to change the penalty for mis-assigning a case to a particular group, and should be a variate with the same length as the number of levels in `Y`

. The penalty for each group is then corresponding value of `PENALTY*WEIGHTS`

.

The `GAMMA`

option (γ in the equations for the kernels) controls the smoothness of the boundary for non-linear kernels, with larger values giving a rougher surface.

With `SVMTYPE`

=`nusvc`

and `nusvr`

, the parameter `NU`

controls the number of support vectors used; default 0.5. With larger values of `NU`

, smaller numbers of support vectors are used, giving a sparser solution that may be more robust and thus perform better in future prediction.

With the regression cases `SVMTYPE`

=`svr`

and `lsvr`

, the parameter `EPSILON`

controls the sensitivity of the loss function being optimized; default 0.1. A range of parameter values for `PENALTY`

, `GAMMA`

, `NU`

or `EPSILON`

are usually tried, to optimize the discrimination between groups or predictions of the y-variate. These parameters also accept a variate, in which case all the values in the variate are tried and the one that minimizes the error rate is selected. Up to two of these parameters can be variates at once. A grid of error rates is then calculated using every combination of the two sets of parameters, and the optimal combination is selected. If three or more of these parameters are set to variates, a warning is given, and only the first values of the third and fourth variates are selected.

When `KERNEL=polynomial`

, the `DEGREE`

option defines the degree of the polynomial (*d* in the equation for the polynomial kernel). The `CONSTANTVALUE`

option gives the constant (*c* in the equations for the kernels), for `KERNEL=polynomial`

and `sigmoid`

.

The `TOLERANCE`

option supplies a small positive value that controls the precision used for the termination criterion. Decreasing this may provide a better solution, but will increase the time taken until convergence.

The `NOSHRINK`

option controls whether unnecessary attributes are dropped from the fitting process; by default, these are dropped, thus increasing the speed to find a solution when there are many iterations (e.g. when `TOLERANCE`

has been made smaller). If few iterations are required to find a solution, it may be faster to set `NOSHRINK`

=`yes`

.

The `OPTMETHOD`

option controls the criterion that is optimized when the `SVMTYPE`

is set to `svc`

, `svr`

, `nusvc`

or `nusvr`

, with settings:

`allocations` |
for the accuracy of allocating individuals to groups; or |
---|---|

`probabilities` |
for sum of the probabilities of allocating an individual to the correct group. |

The `SYMTYPE`

s `lsvc`

, `lsvr`

and `lcs`

fit regularized linear support vector machines using the algorithms in the LIBLINEAR library of Fan *et al.* (2008). This is much faster than the default algorithm, allowing much bigger data sets to be analysed. The `REGULARIZATIONMETHOD`

, `LOSSMETHOD`

and `DUALMETHOD`

options specify which LIBLINEAR algorithm is used for `SYMTYPE`

s `lsvc`

and `lsvr`

.

The `REGULARIZATIONMETHOD`

option allows you to create sparser sets of support vectors, with the `L1`

setting giving a smaller set of support vectors than `L2`

. The `LOSSMETHOD`

option controls the loss function being minimized: the `L2`

setting minimizes the sum of the squared distances of points on the wrong side of the boundary, the `L1`

setting minimizes the sum of the distances, and the `logistic`

setting uses a logistic regression loss function. Setting option `DUALMETHOD`

=`yes`

may be faster when there are a large number of attributes. Not all combinations of `REGULARIZATIONMETHOD`

, `LOSSMETHOD`

and `DUALMETHOD`

options are available.

When `SVMTYPE`

=`lsvc`

, you can use the `BIAS`

option to attempt to achieve a more optimal discrimination between groups. When `BIAS`

is set to a non-negative value, an extra constant attribute is added to the end of each individual. This extra attribute is given a weight that controls the origin of the separating hyper-plane (the origin is where all attributes have value of 0). A `BIAS`

of 0 forces the separating hyper-plane to go through the origin, and a non-zero value moves the plane away from the origin. The `BIAS`

thus acts as a tuning parameter, that changes the hyper-plane’s origin. A range of values can be investigated, to try to improve the discrimination.

Printed output is controlled by the option `PRINT`

with settings:

`summary` |
tables giving the number of units in each group with a complete set of observations; |
---|---|

`allocations` |
tables of counts of allocations; and |

`debug` |
details of the parameters set when calling the libraries. |

The error rate is worked out by cross-validation, which works by randomly splitting the units into a number of groups specified by the `NCROSSVALIDATIONGROUPS`

option. It then omits each of the groups, in turn, and predicts how the omitted units are allocated to the discrimination groups.

The `SEED`

option provides the seed for the random numbers used for allocating individuals to the cross-validation groups. The default value of 0 continues an existing sequence of random numbers. If none have been used in the current Genstat job, it initializes the seed automatically using the computer clock.

The `WORKSPACE`

option can be set if the problem requires more memory than the default settings.

Results from the analysis can be saved using the parameters `PREDICTIONS`

, `ERRORRATE`

, `OPTPENALTY`

, `OPTGAMMA`

, `OPTNU`

, `OPTEPSILON`

and `OPTERRORRATE`

. The structures specified for these parameters need not be declared in advance. If one of the options `PENALTY`

, `GAMMA`

, `NU`

or `EPSILON`

has been set to a variate, `ERRORRATE`

will be a variate indexed by that variate. Alternatively, if two of these options have been set to variates, `ERRORRATE`

will be a matrix with rows and columns indexed by those variates. The `OPT`

parameters contain the values of the parameters, that give the minimum error rate (returned in `OPTERRORRATE`

).

The support vector machine model can be saved in an external file, using the `SAVEFILE`

parameter, so that it can be used later with `SVMPREDICT`

. As the scaling on the attributes must be the same in future data sets, the scaling can be saved with the `SCALE`

parameter. This can supply either a filename (ending in .gsh) to keep these permanently, or a pointer so that these can be applied to the attributes used in `SVMPREDICT`

later in the same program. The file or pointer contains two variates, which give the slope and intercept (in that order) for the linear transform applied to each attribute.

Options: `PRINT`

, `SVMTYPE`

, `KERNEL`

, `PENALTY`

, `GAMMA`

, `NU`

, `EPSILON`

, `BIAS`

, `DEGREE`

, `CONSTANTVALUE`

, `LOWER`

, `UPPER`

, `SCALING`

, `NOSHRINK`

, `OPTMETHOD`

, `REGULARIZATIONMETHOD`

, `LOSSMETHOD`

, `DUALMETHOD`

, `NCROSSVALIDATIONGROUPS`

, `SEED`

, `TOLERANCE`

, `WORKSPACE`

.

Parameters: `Y`

, `X`

, `WEIGHTS`

, `PREDICTIONS`

, `ERRORRATE`

, `OPTPENALTY`

, `OPTGAMMA`

, `OPTNU`

, `OPTEPSILON`

, `OPTERRORRATE`

, `SCALE`

, `SAVEFILE`

.

### Method

`SVMFIT`

provides a Genstat interface to the C++ libraries LIBSVM (Chang & Lin 2001) and LIBLINEAR (Fan *et al.* 2008), that have been compiled into the GenSVM dynamic link library. A user guide by Hsu *et al.* (2003) gives details on their use.

LIBSVM is provided subject to the following copyright notice.

Copyright © 2000-2014 Chih-Chung Chang and Chih-Jen Lin. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors “as is” and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the regents or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

LIBLINEAR is provided subject to the following copyright notice.

Copyright © 2007-2013 The LIBLINEAR Project. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors “as is” and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the regents or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

### Action with `RESTRICT`

The input variates and factor may be restricted. The restrictions must be identical.

### References

Cortes, C. & Vapnik, V. (1995). Support-vector networks. *Machine Learning*, 20, 273-297.

URL: https://link.springer.com/article/10.1007%2FBF00994018

Chang, C.C. & Lin, C.J. (2001). LIBSVM: A library for support vector machines.

URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm

Cramer, K. & Singer, Y. (2000). On learnability and design of output codes for multi-class problems. In *Computational Learning Theory*, 35-46.

Fan, R.E., Chang, K.W, Hsieh, X.R., Wang, X.R. & Lin C.J. (2008). LIBLINEAR: A library for large linear classification. *Journal of Machine Learning Research*, 9, 1871-1874.

URL: http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf

Hsu, C.W., Chang, C.C. & Lin, C.J. (2003). A practical guide to support vector classification. (Technical report). Department of Computer Science and Information Engineering, National Taiwan University.

URL: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

### See also

Directive: `CVA`

.

Procedures: `SVMPREDICT`

, `DISCRIMINATE`

, `QDISCRIMINATE`

, `SDISCRIMINATE`

.

### Example

CAPTION 'SVMFIT for classification: Fisher Iris data'; STYLE=meta SPLOAD [PRINT=*] '%DATA%/Iris.gsh' POINTER [VALUES=Sepal_Length,Sepal_Width,Petal_Length,Petal_Width] Var " Default - radialbasis kernel with scaling." SVMFIT [PRINT=summary,allocations; SEED=726454] Y=Species; X=Var " Unscaled with linear kernel." SVMFIT [PRINT=summary,allocations; KERNEL=linear; SCALING=none;\ SEED=143038] Y=Species; X=Var CAPTION 'SVMFIT for regression: Los Angeles Ozone data'; STYLE=meta SPLOAD [PRINT=*] '%DATA%/Ozone.gsh'; ISAVE=Data SUBSET [Ozone /= !s(*)] Data[] POINTER [VALUES=Data[1,2,(5...10)]] OZVars " Find optimal values for penalty and gamma." SVMFIT [PRINT=summary; SVMTYPE=svr; PENALTY=!(1,10,100,500,1000);\ GAMMA=!(0.05,0.1,0.2,0.4); SEED=562011] Y=Ozone; X=OZVars;\ PREDICTIONS=POzone DGRAPH [TITLE='Los Angeles Ozone levels 1976 ~{epsilon}-regression';\ KEY=0;WIND=3] Y=POzone; X=Ozone