SVMFIT procedure

Fits a support vector machine (D. B. Baird).

Options

`PRINT` = string tokens	Printed output from the analysis (`summary`, `predictions`, `allocations`, `debug`); default `summ`, `alloc`
`SVMTYPE` = string token	Type of support vector machine to fit (`svc`, `svr`, `nusvc`, `nusvr`, `lsvc`, `lsvr`, `lcs`, `svm1`); default `svc`
`KERNEL` = string token	Type of kernel to use (`linear`, `polynomial`, `radialbasis`, `sigmoid`); default `radi`
`PENALTY` = scalar or variate	Penalty or cost for points on the wrong side of the boundary; default 1
`GAMMA` = scalar or variate	Gamma parameter for types with non-linear kernels; default 1
`NU` = scalar or variate	Nu parameter for types `nusvc`, `nusvr`, and `svm1`; default 0.5
`EPSILON` = scalar or variate	Epsilon parameter for types `svr` and `lsvr`; default 0.1
`BIAS` = scalar	Bias for allocations to groups for types `lsvc` and `lsvr`; default -1 i.e. no bias
`DEGREE` = scalar	Degree for polynomial kernel; default 3
`CONSTANTVALUE` = scalar	Constant for polynomial or sigmoid kernel; default 0
`LOWER` = scalar or variate	Lower limit for scaling data variates; default -1
`UPPER` = scalar or variate	Upper limit for scaling data variates; default 1
`SCALING` = string token	Type of scaling to use (`none`, `uniform`, `given`); default `unif`
`NOSHRINK` =string token	Whether to suppress the shrinkage of attributes to exclude unused ones (`no`, `yes`); default `no`
`OPTMETHOD` =string token	Whether to optimize probabilities or allocations (`allocations`, `probabilities`); default `allo`
`REGULARIZATIONMETHOD` = string token	Regularization method for `SVMTYPE` `=` `lsvc` or `lsvr` (`l1`, `l2`); default `l2`
`LOSSMETHOD` = string token	Loss method for `SVMTYPE` `=` `lsvc` or `lsvr` (`logistic`, `l1`, `l2`); default `logi`
`DUALMETHOD` = string token	Whether to use the dual algorithm for `SVMTYPE` `=` `lsvc` or `lsvr` (`yes`, `no`); default `no`
`NCROSSVALIDATIONGROUPS` = scalar	Number of groups for cross-validation; default 10
`SEED` = scalar	Seed for random number generation; default 0
`TOLERANCE` = scalar	Tolerance for termination criterion; default 0.001
`WORKSPACE` = scalar	Size of workspace needed for data; default is to calculate this from the number of observations and variates

Parameters

`Y` = factors or variates	Define groupings for the units in each training set y-variate to be predicted via regression, with missing values in the units to be allocated or predicted
`X` = pointers	Each pointer contains a set of explanatory variates or factors
`WEIGHTS` = variates	Weights to multiply penalties for each group when `SVMTYPE` `=` `svc`, `nusvc`, `lsvc` or `lcs`
`PREDICTIONS` = factors or variates	Saves allocations to groups or predictions from regression
`ERRORRATE` = scalars, variates or matrices	Saves the error rate for the combinations of parameters specified for the support vector machine
`OPTPENALTY` = scalars	Saves the optimal value of penalty parameter
`OPTGAMMA` = scalars	Saves the optimal value of gamma parameter
`OPTNU` = scalars	Saves the optimal value of nu parameter
`OPTEPSILON` = scalars	Saves the optimal value of epsilon parameter
`OPTERRORRATE` = scalars	Saves the minimum error rate
`SCALE` = texts or pointers	Saves the scaling used for the `X` variates, in a file if a text is given, or otherwise in a pointer to a pair of variates
`SAVEFILE` = texts	File in which to save the model, for use by `SVMPREDICT`

Description

SVMFIT fits a support vector machine (Cortes & Vapnik 1995), which defines multivariate boundaries to separate groups, or predict values. It provides a Genstat interface to the libraries LIBSVM (Chang & Lin 2001) and LIBLINEAR (Fan et al. 2008), which are made available subject to the conditions listed in the Method section.

Unlike linear discriminant analysis, a support vector machine assumes no statistical model for the distribution of individuals within a group. The method is thus less affected by outliers. The method chooses boundaries to maximize the separation between groups. The reason why this is known as a support vector machine, is that there is a small set of data points that define the boundaries, and these are known as the support vectors. If individuals lie on the wrong side of the boundary, the distance from the boundary, multiplied by a penalty, is added to the separation criterion.

The type of support vector machine to fit is specified by the SVMTYPE option, with settings:

`svc`	a multi-class support vector classifier with a range of kernels for discriminating between groups;
`svr`	support vector regression with a range of kernels for predicting the values of a y-variate as in a regression;
`nusvc`	Nu classification – a multi-class support vector classifier with a range of kernels for discriminating between groups with a parameter `NU` that controls the fraction of support vectors used;
`nusvr`	Nu regression – support vector regression with a range of kernels for predicting the values of a y-variate as in a regression with a parameter `NU` that controls the fraction of support vectors used;
`lsvc`	Fast linear classification – a fast regularized linear support vector for discriminating between groups;
`lsvr`	Fast linear regression – a fast regularized linear support vector regression for predicting the values of a y-variate as in a regression;
`lcs`	a fast linear support vector machine for discriminating between groups using the approach of Cramer & Singer (2000), where a direct method for training multi-class predictors is used, rather than dividing the multi-class classification into a set of binary classifications; and
`svm1`	Consistent group SVM – a support vector machine which attempts to identify a consistent group of observations.

The shape of the boundary is controlled by the KERNEL option which specifies the metric used to measure distance between multi-dimensional points u and v. The settings are:

`linear`	the linear function u′v;
`polynomial`	the polynomial function γ (u′v + c)^d;
`radialbasis`	the radial basis function exp(-γ \|u – v\|²); and
`sigmoid`	the sigmoid function tanh(γ u′v + c).

With a linear kernel, the boundaries are multi-dimensional planes. For the other types they are curved surfaces. The kernel is ignored for SVMTYPE=lsvc, lsvr and lcs as these always use a linear kernel.

The data set is supplied in a pointer of explanatory variates or factors, specified by the X parameter, and a response variate or factor specified by the Y parameter. The Y parameter need not be set if SVMTYPE=svm1, as this searches for a consistent group of individuals in the data set, ignoring the Y parameter. Explanatory factors are converted to variates, using the levels of the factor concerned. Any unit with a missing value in an explanatory variate takes a zero value for that attribute. With the default, uniform, scaling this puts them in the centre of the range of the variate concerned. Units can also be excluded from the analysis by restricting the factor or variates; any such restrictions must be consistent.

The response factor specifies the pre-defined groupings of the units from which the allocation is derived (the “training set”); the units to be allocated by the analysis have missing values for Y. A response variate supplies training values for a regression-type support vector machine. (These are requested by SVMTYPE settings svr, nusvr and lsvr.) Units to be predicted by the regression have missing values in the y-variate.

The support vector machine solutions depend on the scale of the attributes. It is usually recommended that all attributes are put on the same scale, so that they all have the same influence. This is controlled by the SCALING option, with settings:

`none`	the attributes are used as supplied, with no scaling;
`uniform`	all the attributes are centred, and scaled to have the same minimum and maximum (default); and
`given`	the variates are scaled using the `LOWER` and `UPPER` options.

The LOWER and UPPER options can be set to a scalar, to apply a uniform scaling, where all the variates are given the same minimum (LOWER) and maximum (UPPER) value; alternatively, they can be variates specifying the minimum and maximum value for each variate, respectively.

The PENALTY option defines the penalty that is applied to the sum of distances for the points on the wrong side of the boundary when calculating the optimal boundaries; default 1. Larger values apply more weight to points that are on the wrong side of the discrimination boundaries, and can be investigated to optimize performance. However, linear support vector machines are generally insensitive to the choice of the penalty. The WEIGHTS parameter can be used to change the penalty for mis-assigning a case to a particular group, and should be a variate with the same length as the number of levels in Y. The penalty for each group is then corresponding value of PENALTY*WEIGHTS.

The GAMMA option (γ in the equations for the kernels) controls the smoothness of the boundary for non-linear kernels, with larger values giving a rougher surface.

With SVMTYPE=nusvc and nusvr, the parameter NU controls the number of support vectors used; default 0.5. With larger values of NU, smaller numbers of support vectors are used, giving a sparser solution that may be more robust and thus perform better in future prediction.

With the regression cases SVMTYPE=svr and lsvr, the parameter EPSILON controls the sensitivity of the loss function being optimized; default 0.1. A range of parameter values for PENALTY, GAMMA, NU or EPSILON are usually tried, to optimize the discrimination between groups or predictions of the y-variate. These parameters also accept a variate, in which case all the values in the variate are tried and the one that minimizes the error rate is selected. Up to two of these parameters can be variates at once. A grid of error rates is then calculated using every combination of the two sets of parameters, and the optimal combination is selected. If three or more of these parameters are set to variates, a warning is given, and only the first values of the third and fourth variates are selected.

When KERNEL=polynomial, the DEGREE option defines the degree of the polynomial (d in the equation for the polynomial kernel). The CONSTANTVALUE option gives the constant (c in the equations for the kernels), for KERNEL=polynomial and sigmoid.

The TOLERANCE option supplies a small positive value that controls the precision used for the termination criterion. Decreasing this may provide a better solution, but will increase the time taken until convergence.

The NOSHRINK option controls whether unnecessary attributes are dropped from the fitting process; by default, these are dropped, thus increasing the speed to find a solution when there are many iterations (e.g. when TOLERANCE has been made smaller). If few iterations are required to find a solution, it may be faster to set NOSHRINK=yes.

The OPTMETHOD option controls the criterion that is optimized when the SVMTYPE is set to svc, svr, nusvc or nusvr, with settings:

`allocations`	for the accuracy of allocating individuals to groups; or
`probabilities`	for sum of the probabilities of allocating an individual to the correct group.

The SYMTYPEs lsvc, lsvr and lcs fit regularized linear support vector machines using the algorithms in the LIBLINEAR library of Fan et al. (2008). This is much faster than the default algorithm, allowing much bigger data sets to be analysed. The REGULARIZATIONMETHOD, LOSSMETHOD and DUALMETHOD options specify which LIBLINEAR algorithm is used for SYMTYPEs lsvc and lsvr.

The REGULARIZATIONMETHOD option allows you to create sparser sets of support vectors, with the L1 setting giving a smaller set of support vectors than L2. The LOSSMETHOD option controls the loss function being minimized: the L2 setting minimizes the sum of the squared distances of points on the wrong side of the boundary, the L1 setting minimizes the sum of the distances, and the logistic setting uses a logistic regression loss function. Setting option DUALMETHOD=yes may be faster when there are a large number of attributes. Not all combinations of REGULARIZATIONMETHOD, LOSSMETHOD and DUALMETHOD options are available.

When SVMTYPE=lsvc, you can use the BIAS option to attempt to achieve a more optimal discrimination between groups. When BIAS is set to a non-negative value, an extra constant attribute is added to the end of each individual. This extra attribute is given a weight that controls the origin of the separating hyper-plane (the origin is where all attributes have value of 0). A BIAS of 0 forces the separating hyper-plane to go through the origin, and a non-zero value moves the plane away from the origin. The BIAS thus acts as a tuning parameter, that changes the hyper-plane’s origin. A range of values can be investigated, to try to improve the discrimination.

Printed output is controlled by the option PRINT with settings:

`summary`	tables giving the number of units in each group with a complete set of observations;
`allocations`	tables of counts of allocations; and
`debug`	details of the parameters set when calling the libraries.

The error rate is worked out by cross-validation, which works by randomly splitting the units into a number of groups specified by the NCROSSVALIDATIONGROUPS option. It then omits each of the groups, in turn, and predicts how the omitted units are allocated to the discrimination groups.

The SEED option provides the seed for the random numbers used for allocating individuals to the cross-validation groups. The default value of 0 continues an existing sequence of random numbers. If none have been used in the current Genstat job, it initializes the seed automatically using the computer clock.

The WORKSPACE option can be set if the problem requires more memory than the default settings.

Results from the analysis can be saved using the parameters PREDICTIONS, ERRORRATE, OPTPENALTY, OPTGAMMA, OPTNU, OPTEPSILON and OPTERRORRATE. The structures specified for these parameters need not be declared in advance. If one of the options PENALTY, GAMMA, NU or EPSILON has been set to a variate, ERRORRATE will be a variate indexed by that variate. Alternatively, if two of these options have been set to variates, ERRORRATE will be a matrix with rows and columns indexed by those variates. The OPT parameters contain the values of the parameters, that give the minimum error rate (returned in OPTERRORRATE).

The support vector machine model can be saved in an external file, using the SAVEFILE parameter, so that it can be used later with SVMPREDICT. As the scaling on the attributes must be the same in future data sets, the scaling can be saved with the SCALE parameter. This can supply either a filename (ending in .gsh) to keep these permanently, or a pointer so that these can be applied to the attributes used in SVMPREDICT later in the same program. The file or pointer contains two variates, which give the slope and intercept (in that order) for the linear transform applied to each attribute.

Options: PRINT, SVMTYPE, KERNEL, PENALTY, GAMMA, NU, EPSILON, BIAS, DEGREE, CONSTANTVALUE, LOWER, UPPER, SCALING, NOSHRINK, OPTMETHOD, REGULARIZATIONMETHOD, LOSSMETHOD, DUALMETHOD, NCROSSVALIDATIONGROUPS, SEED, TOLERANCE, WORKSPACE.

Parameters: Y, X, WEIGHTS, PREDICTIONS, ERRORRATE, OPTPENALTY, OPTGAMMA, OPTNU, OPTEPSILON, OPTERRORRATE, SCALE, SAVEFILE.

Method

SVMFIT provides a Genstat interface to the C++ libraries LIBSVM (Chang & Lin 2001) and LIBLINEAR (Fan et al. 2008), that have been compiled into the GenSVM dynamic link library. A user guide by Hsu et al. (2003) gives details on their use.

LIBSVM is provided subject to the following copyright notice.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided by the copyright holders and contributors “as is” and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the regents or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

LIBLINEAR is provided subject to the following copyright notice.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

3. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

Action with `RESTRICT`

The input variates and factor may be restricted. The restrictions must be identical.

References

Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273-297.

URL: https://link.springer.com/article/10.1007%2FBF00994018

Chang, C.C. & Lin, C.J. (2001). LIBSVM: A library for support vector machines.

URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm

Cramer, K. & Singer, Y. (2000). On learnability and design of output codes for multi-class problems. In Computational Learning Theory, 35-46.

Fan, R.E., Chang, K.W, Hsieh, X.R., Wang, X.R. & Lin C.J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871-1874.

URL: http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf

Hsu, C.W., Chang, C.C. & Lin, C.J. (2003). A practical guide to support vector classification. (Technical report). Department of Computer Science and Information Engineering, National Taiwan University.

URL: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Example

CAPTION 'SVMFIT for classification: Fisher Iris data'; STYLE=meta
SPLOAD  [PRINT=*] '%DATA%/Iris.gsh'
POINTER [VALUES=Sepal_Length,Sepal_Width,Petal_Length,Petal_Width] Var
" Default - radialbasis kernel with scaling."
SVMFIT  [PRINT=summary,allocations; SEED=726454] Y=Species; X=Var
" Unscaled with linear kernel."
SVMFIT  [PRINT=summary,allocations; KERNEL=linear; SCALING=none;\
        SEED=143038] Y=Species; X=Var

CAPTION 'SVMFIT for regression: Los Angeles Ozone data'; STYLE=meta
SPLOAD  [PRINT=*] '%DATA%/Ozone.gsh'; ISAVE=Data
SUBSET  [Ozone /= !s(*)] Data[]
POINTER [VALUES=Data[1,2,(5...10)]] OZVars
" Find optimal values for penalty and gamma."
SVMFIT  [PRINT=summary; SVMTYPE=svr; PENALTY=!(1,10,100,500,1000);\
        GAMMA=!(0.05,0.1,0.2,0.4); SEED=562011] Y=Ozone; X=OZVars;\
        PREDICTIONS=POzone
DGRAPH  [TITLE='Los Angeles Ozone levels 1976 ~{epsilon}-regression';\
        KEY=0;WIND=3] Y=POzone; X=Ozone

Updated on March 11, 2022

Was this article helpful?

Yes No