Fits a radial basis function model.
Options
PRINT = string tokens |
Controls fitted output (description , estimates , fittedvalues , summary ); default desc , esti , summ |
---|---|
RBTYPE = string token |
Type of radial basis function (linear , cubic , thinplate , gaussian , multiquadric , inversemultiquadric , cauchy ); default line |
METRIC = string token |
How to calculate distances for the radial basis functions (euclidean , cityblock , manhattan , pythagorean ); default eucl |
SCALING = string token |
Type of scaling used to compute distances (sd , mahalanobis , supplied ); default sd |
ALPHA = scalar |
Specifies the value for the constant α, used to calculate radial distances for RBTPYE settings multiquadric , inversemultiquadric and cauchy ; default 1 |
LAMBDA = scalar |
Specifies the value of the penalty constant λ |
TOLERANCE = scalar |
Tolerance for setting eigenvalues equal to zero in the singular value decomposition; default 0.000001 |
Parameters
Y = variates |
Response variates |
---|---|
X = pointers |
Independent variates |
CENTRES = pointers |
Centres of the radial basis functions for the dependent variates |
RBSCALING = scalars or variates |
Scaling parameters for the radial distance calculations when SCALING=supplied ; default 1 |
FITTEDVALUES = variates |
Fitted values generated for each y-variate by the model |
ESTIMATES = variates |
Saves the estimated model parameters |
EXIT = scalars |
Saves the exit code |
SAVE = pointers |
Saves details of the model and the estimated parameters for RBDISPLAY or RBPREDICT |
Description
RBFIT
estimates the parameters of a radial basis function model. The response variate is supplied by the Y
parameter, and the independent (or x-) variates are supplied in a pointer by the X
parameter.
The model assumes that the y-value on each unit is related to the vector x of x-values (x1 … xp) on that unit, according to the model
y = f(x) + ε
for some unknown function f() and noise ε drawn at random from a Normal distribution with zero mean and unit variance. A radial basis function (RBF) model approximates the function f() by a linear combination of t basis functions, giving an approximate fitted value f for the dependent value
f = ∑k=1…t wk hk + wt+1 b
where b is a scalar intercept term and hk is the value given by an RBF for a radial distance zk between x and a centre location ck defined for the kth RBF.
The centre locations are supplied in a pointer by the CENTRES
parameter. This should have a variate for each x-variate, with a unit for each RBF.
The METRIC
option defines how the radial distances are calculated. The default setting, euclidean
, uses a scaled Euclidean distance
zk = [(x – ck) S-1 (x – ck)′]1/2
where the form of the scaling matrix S is controlled by the SCALING
option (see below). The cityblock
setting calculates the distance as
zk = ∑k=1…t |xj – ckj| / sj
where sj is the jth diagonal element of the scaling matrix S. METRIC
also has settings pythagorean
and manhattan
which act as synonyms of euclidean
and cityblock
.
The available forms of the scaling matrix, and corresponding settings of the SCALING
option are as follows:
sd |
diagonal matrix containing the standard deviations of the x-variates (default), |
---|---|
mahalanobis |
variance-covariance matrix of the data values of x-variables (to give the Mahalanobis distance), |
supplied |
user-defined scaling parameters, supplied by the RBSCALING parameter. |
The mahalanobis
setting is available only for the euclidean or pythagorean
settings of the METRIC
option. The setting of RBSCALING
can be either a scalar or a variate, depending upon the parameters are the same or different over the x-variates; the values must all be greater then zero.
The form φ() of the radial basis functions is specified by the RNTYPE
option, by selecting one of the following settings:
linear |
φ(z) = z, |
---|---|
cubic |
φ(z) = z3, |
thinplate |
φ(z) = z2 loge(z), |
gaussian |
φ(z) = exp(-z2), |
multiquadric |
φ(z) = √{z2+ α2}, |
inversemultiquadric |
φ(z) = 1 / √{z2+ α2}, |
cauchy |
φ(z) = 1 / (z2+ α2). |
The value of the constant α (which must be positive) is specified by the ALPHA
option, with a default of one.
The RBF model is fitted by estimating values for the weights wk. This is done by minimizing the penalized (regularized) sum of squares error function:
(y – f)′ (y – f) + λ ∑k=1…t+1 wk2
where the penalty constant λ must be specified by the LAMBDA
option.
The inverse-matrix calculations required during the fit are formed using a singular value decomposition. In the calculations, singular values that are less than the largest singular value multiplied by a tolerance are treated as zero. This tolerance is specified by the TOLERANCE
option; default 0.000001.
Printed output is controlled by the PRINT
option, with settings:
description |
a description of the model, |
---|---|
estimates |
estimates of the parameters, |
fittedvalues |
fitted values, |
summary |
summary (lack of fit etc.). |
The SAVE
parameter can save full detail of the RBF model; this can then be used by the RBDISPLAY
directive to give further output, or by the RBPREDICT
directive to form predictions. The estimated weights can be saved using the ESTIMATES
parameter, and the fitted values vcan be saved by the FITTEDVALUES
parameter.
Options: PRINT
, RBTYPE
, METRIC
, SCALING
, ALPHA
, LAMBDA
, TOLERANCE
.
Parameters: Y
, X
, CENTRES
. RBSCALING
, FITTEDVALUES
, ESTIMATES
, EXIT
, SAVE
.
Method
RBFIT
uses the function nagdmc_rbf
from the Numerical Algorithms Group’s library of Data Mining Components (DMCs).
Action with RESTRICT
You can restrict the set of units used for the estimation by applying a restriction to the y-variate or any of the x-variates. If several of these are restricted, they must all be restricted to the same set of units.
See also
Directives: RBDISPLAY
, RBPREDICT
, ASRULES
, NNFIT
.
Procedures: KNEARESTNEIGHBOURS
, RADIALSPLINE
.
Commands for: Data mining.
Example
CAPTION 'RBFIT example',\ 'Predicting the grape cultivar from 13 wine attributes'; STYLE=meta,plain SPLOAD '%Data%/WinesTrain.gsh'; ISAVE=pData CALC NData,NUnits = NVALUES(pData,pData[1]) POINTER [VALUES=pData[2...NData]] Attributes GROUPS Wine; FACTOR=Cultivar TXCONST [TEXT=AttrName] Attributes POINTER [NVALUES=AttrName] Mn TABULATE [CLASS=Cultivar] Attributes[]; MEANS=Mn[] VARIATE [VALUES=0.5,1,2,5,8,10,12,15,20,40] Lambda CALC ErrorRate = !s(*)*Lambda FOR [INDEX=i] L = #Lambda RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=L] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit "Predicted class is the closest integer 1...3" CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) CALC ErrorRate$[i] = 100*SUM(Cultivar /= Prediction)/NUnits ENDFOR PRINT Lambda,ErrorRate; DEC=3 CALC iMin = MINPOSITION(ErrorRate) CALC BestLambda = Lambda$[iMin] PRINT BestLambda; DEC=3 RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=BestLambda] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit; SAVE=RBSave RBDISPLAY [PRINT=description, estimates, fittedvalues, summary] RBSave "Show misclassified counts" CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) GROUPS Prediction; FACTOR=PCultivar TABULATE [CLASS=Cultivar,PCultivar; PRINT=counts] "Try other models" FOR [INDEX=i] Model = 'cubic','thinplate','gaussian','multiquadric',\ 'inversemultiquadric','cauchy'; lambda = 1.5,1,0.1,1,0.1,0.01 RBFIT [PRINT=*; RBTYPE=#Model; LAMBDA=lambda] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) CALC ErrRate = 100*SUM(Cultivar /= Prediction)/NUnits IF i == 1 SKIP [FILE=output] 1 PRINT [SQUASH=y;IPRINT=*] 'Model','Lambda','Error rate'; \ JUST=left; FIELD=20,8,8 ENDIF PRINT [SQUASH=y;IPRINT=*] Model,lambda,ErrRate; DEC=3; FIELD=20,8,8 ENDFOR "Predictions from best linear model" SPLOAD '%Data%/WinesPred.gsh'; ISAVE=TestAttr RBPREDICT X=TestAttr; PREDICTIONS=TFit; SAVE=RBSave CALC TPrediction = 1 + (TFit > 1.5) + (TFit > 2.5) GROUPS TPrediction; TCultivar TABULATE [CLASS=TCultivar; PRINT=counts]