1. Home
  2. RBFIT directive

RBFIT directive

Fits a radial basis function model.

Options

PRINT = string tokens Controls fitted output (description, estimates, fittedvalues, summary); default desc, esti, summ
RBTYPE = string token Type of radial basis function (linear, cubic, thinplate, gaussian, multiquadric, inversemultiquadric, cauchy); default line
METRIC = string token How to calculate distances for the radial basis functions (euclidean, cityblock, manhattan, pythagorean); default eucl
SCALING = string token Type of scaling used to compute distances (sd, mahalanobis, supplied); default sd
ALPHA = scalar Specifies the value for the constant α, used to calculate radial distances for RBTPYE settings multiquadric, inversemultiquadric and cauchy; default 1
LAMBDA = scalar Specifies the value of the penalty constant λ
TOLERANCE = scalar Tolerance for setting eigenvalues equal to zero in the singular value decomposition; default 0.000001

Parameters

Y = variates Response variates
X = pointers Independent variates
CENTRES = pointers Centres of the radial basis functions for the dependent variates
RBSCALING = scalars or variates Scaling parameters for the radial distance calculations when SCALING=supplied; default 1
FITTEDVALUES = variates Fitted values generated for each y-variate by the model
ESTIMATES = variates Saves the estimated model parameters
EXIT = scalars Saves the exit code
SAVE = pointers Saves details of the model and the estimated parameters for RBDISPLAY or RBPREDICT

Description

RBFIT estimates the parameters of a radial basis function model. The response variate is supplied by the Y parameter, and the independent (or x-) variates are supplied in a pointer by the X parameter.

The model assumes that the y-value on each unit is related to the vector x of x-values (x1xp) on that unit, according to the model

y = f(x) + ε

for some unknown function f() and noise ε drawn at random from a Normal distribution with zero mean and unit variance. A radial basis function (RBF) model approximates the function f() by a linear combination of t basis functions, giving an approximate fitted value f for the dependent value

f = ∑k=1…t wk hk + wt+1 b

where b is a scalar intercept term and hk is the value given by an RBF for a radial distance zk between x and a centre location ck defined for the kth RBF.

The centre locations are supplied in a pointer by the CENTRES parameter. This should have a variate for each x-variate, with a unit for each RBF.

The METRIC option defines how the radial distances are calculated. The default setting, euclidean, uses a scaled Euclidean distance

zk = [(x – ck) S-1 (x – ck)′]1/2

where the form of the scaling matrix S is controlled by the SCALING option (see below). The cityblock setting calculates the distance as

zk = ∑k=1…t |xjckj| / sj

where sj is the jth diagonal element of the scaling matrix S. METRIC also has settings pythagorean and manhattan which act as synonyms of euclidean and cityblock.

The available forms of the scaling matrix, and corresponding settings of the SCALING option are as follows:

    sd diagonal matrix containing the standard deviations of the x-variates (default),
    mahalanobis variance-covariance matrix of the data values of x-variables (to give the Mahalanobis distance),
    supplied user-defined scaling parameters, supplied by the RBSCALING parameter.

The mahalanobis setting is available only for the euclidean or pythagorean settings of the METRIC option. The setting of RBSCALING can be either a scalar or a variate, depending upon the parameters are the same or different over the x-variates; the values must all be greater then zero.

The form φ() of the radial basis functions is specified by the RNTYPE option, by selecting one of the following settings:

    linear φ(z) = z,
    cubic φ(z) = z3,
    thinplate φ(z) = z2 loge(z),
    gaussian φ(z) = exp(-z2),
    multiquadric φ(z) = √{z2+ α2},
    inversemultiquadric φ(z) = 1 / √{z2+ α2},
    cauchy φ(z) = 1 / (z2+ α2).

The value of the constant α (which must be positive) is specified by the ALPHA option, with a default of one.

The RBF model is fitted by estimating values for the weights wk. This is done by minimizing the penalized (regularized) sum of squares error function:

(y – f)′ (y – f) + λ ∑k=1…t+1 wk2

where the penalty constant λ must be specified by the LAMBDA option.

The inverse-matrix calculations required during the fit are formed using a singular value decomposition. In the calculations, singular values that are less than the largest singular value multiplied by a tolerance are treated as zero. This tolerance is specified by the TOLERANCE option; default 0.000001.

Printed output is controlled by the PRINT option, with settings:

    description a description of the model,
    estimates estimates of the parameters,
    fittedvalues fitted values,
    summary summary (lack of fit etc.).

The SAVE parameter can save full detail of the RBF model; this can then be used by the RBDISPLAY directive to give further output, or by the RBPREDICT directive to form predictions. The estimated weights can be saved using the ESTIMATES parameter, and the fitted values vcan be saved by the FITTEDVALUES parameter.

Options: PRINT, RBTYPE, METRIC, SCALING, ALPHA, LAMBDA, TOLERANCE.

Parameters: Y, X, CENTRES. RBSCALING, FITTEDVALUES, ESTIMATES, EXIT, SAVE.

Method

RBFIT uses the function nagdmc_rbf from the Numerical Algorithms Group’s library of Data Mining Components (DMCs).

Action with RESTRICT

You can restrict the set of units used for the estimation by applying a restriction to the y-variate or any of the x-variates. If several of these are restricted, they must all be restricted to the same set of units.

See also

Directives: RBDISPLAY, RBPREDICT, ASRULES, NNFIT.

Procedures: KNEARESTNEIGHBOURS, RADIALSPLINE.

Commands for: Data mining.

Example

CAPTION 'RBFIT example',\
   'Predicting the grape cultivar from 13 wine attributes'; STYLE=meta,plain
SPLOAD   '%Data%/WinesTrain.gsh'; ISAVE=pData
CALC     NData,NUnits = NVALUES(pData,pData[1])
POINTER  [VALUES=pData[2...NData]] Attributes
GROUPS   Wine; FACTOR=Cultivar
TXCONST  [TEXT=AttrName] Attributes
POINTER  [NVALUES=AttrName] Mn
TABULATE [CLASS=Cultivar] Attributes[]; MEANS=Mn[]

VARIATE [VALUES=0.5,1,2,5,8,10,12,15,20,40] Lambda
CALC    ErrorRate = !s(*)*Lambda

FOR [INDEX=i] L = #Lambda
   RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=L] Y=Wine; X=Attributes; \
         CENTRES=Mn; FITTED=Fit
   "Predicted class is the closest integer 1...3"
   CALC  Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
   CALC  ErrorRate$[i] = 100*SUM(Cultivar /= Prediction)/NUnits
ENDFOR

PRINT Lambda,ErrorRate; DEC=3
CALC  iMin = MINPOSITION(ErrorRate)
CALC  BestLambda = Lambda$[iMin]
PRINT BestLambda; DEC=3

RBFIT     [PRINT=*; RBTYPE=linear; LAMBDA=BestLambda] Y=Wine; X=Attributes; \
          CENTRES=Mn; FITTED=Fit; SAVE=RBSave
RBDISPLAY [PRINT=description, estimates, fittedvalues, summary] RBSave

"Show misclassified counts"
CALC     Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
GROUPS   Prediction; FACTOR=PCultivar
TABULATE [CLASS=Cultivar,PCultivar; PRINT=counts]

"Try other models"
FOR [INDEX=i] Model = 'cubic','thinplate','gaussian','multiquadric',\
    'inversemultiquadric','cauchy'; lambda = 1.5,1,0.1,1,0.1,0.01
   RBFIT [PRINT=*; RBTYPE=#Model; LAMBDA=lambda] Y=Wine; X=Attributes; \
         CENTRES=Mn; FITTED=Fit
   CALC  Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
   CALC  ErrRate = 100*SUM(Cultivar /= Prediction)/NUnits
   IF i == 1
      SKIP  [FILE=output] 1
      PRINT [SQUASH=y;IPRINT=*] 'Model','Lambda','Error rate'; \
            JUST=left; FIELD=20,8,8
   ENDIF
   PRINT [SQUASH=y;IPRINT=*] Model,lambda,ErrRate; DEC=3; FIELD=20,8,8
ENDFOR

"Predictions from best linear model"
SPLOAD    '%Data%/WinesPred.gsh'; ISAVE=TestAttr
RBPREDICT X=TestAttr; PREDICTIONS=TFit; SAVE=RBSave
CALC      TPrediction = 1 + (TFit > 1.5) + (TFit > 2.5)
GROUPS    TPrediction; TCultivar
TABULATE  [CLASS=TCultivar; PRINT=counts]
Updated on March 11, 2022

Was this article helpful?