RBFIT directive

Fits a radial basis function model.

Options

`PRINT` = string tokens	Controls fitted output (`description`, `estimates`, `fittedvalues`, `summary`); default `desc`, `esti`, `summ`
`RBTYPE` = string token	Type of radial basis function (`linear`, `cubic`, `thinplate`, `gaussian`, `multiquadric`, `inversemultiquadric`, `cauchy`); default `line`
`METRIC` = string token	How to calculate distances for the radial basis functions (`euclidean`, `cityblock`, `manhattan`, `pythagorean`); default `eucl`
`SCALING` = string token	Type of scaling used to compute distances (`sd`, `mahalanobis`, `supplied`); default `sd`
`ALPHA` = scalar	Specifies the value for the constant α, used to calculate radial distances for `RBTPYE` settings `multiquadric`, `inversemultiquadric` and `cauchy`; default 1
`LAMBDA` = scalar	Specifies the value of the penalty constant λ
`TOLERANCE` = scalar	Tolerance for setting eigenvalues equal to zero in the singular value decomposition; default 0.000001

Parameters

`Y` = variates	Response variates
`X` = pointers	Independent variates
`CENTRES` = pointers	Centres of the radial basis functions for the dependent variates
`RBSCALING` = scalars or variates	Scaling parameters for the radial distance calculations when `SCALING=supplied`; default 1
`FITTEDVALUES` = variates	Fitted values generated for each y-variate by the model
`ESTIMATES` = variates	Saves the estimated model parameters
`EXIT` = scalars	Saves the exit code
`SAVE` = pointers	Saves details of the model and the estimated parameters for `RBDISPLAY` or `RBPREDICT`

Description

RBFIT estimates the parameters of a radial basis function model. The response variate is supplied by the Y parameter, and the independent (or x-) variates are supplied in a pointer by the X parameter.

The model assumes that the y-value on each unit is related to the vector x of x-values (x₁ … x_p) on that unit, according to the model

y = f(x) + ε

for some unknown function f() and noise ε drawn at random from a Normal distribution with zero mean and unit variance. A radial basis function (RBF) model approximates the function f() by a linear combination of t basis functions, giving an approximate fitted value f for the dependent value

f = ∑_k=1…t w_k h_k + w_t+1 b

where b is a scalar intercept term and h_k is the value given by an RBF for a radial distance z_k between x and a centre location c_k defined for the kth RBF.

The centre locations are supplied in a pointer by the CENTRES parameter. This should have a variate for each x-variate, with a unit for each RBF.

The METRIC option defines how the radial distances are calculated. The default setting, euclidean, uses a scaled Euclidean distance

z_k = [(x – c_k) S^-1 (x – c_k)′]^1/2

where the form of the scaling matrix S is controlled by the SCALING option (see below). The cityblock setting calculates the distance as

z_k = ∑_k=1…t |x_j – c_kj| / s_j

where s_j is the jth diagonal element of the scaling matrix S. METRIC also has settings pythagorean and manhattan which act as synonyms of euclidean and cityblock.

The available forms of the scaling matrix, and corresponding settings of the SCALING option are as follows:

`sd`	diagonal matrix containing the standard deviations of the x-variates (default),
`mahalanobis`	variance-covariance matrix of the data values of x-variables (to give the Mahalanobis distance),
`supplied`	user-defined scaling parameters, supplied by the `RBSCALING` parameter.

The mahalanobis setting is available only for the euclidean or pythagorean settings of the METRIC option. The setting of RBSCALING can be either a scalar or a variate, depending upon the parameters are the same or different over the x-variates; the values must all be greater then zero.

The form φ() of the radial basis functions is specified by the RNTYPE option, by selecting one of the following settings:

`linear`	φ(z) = z,
`cubic`	φ(z) = z³,
`thinplate`	φ(z) = z² log_e(z),
`gaussian`	φ(z) = exp(-z²),
`multiquadric`	φ(z) = √{z²+ α²},
`inversemultiquadric`	φ(z) = 1 / √{z²+ α²},
`cauchy`	φ(z) = 1 / (z²+ α²).

The value of the constant α (which must be positive) is specified by the ALPHA option, with a default of one.

The RBF model is fitted by estimating values for the weights w_k. This is done by minimizing the penalized (regularized) sum of squares error function:

(y – f)′ (y – f) + λ ∑_k=1…t+1 w_k²

where the penalty constant λ must be specified by the LAMBDA option.

The inverse-matrix calculations required during the fit are formed using a singular value decomposition. In the calculations, singular values that are less than the largest singular value multiplied by a tolerance are treated as zero. This tolerance is specified by the TOLERANCE option; default 0.000001.

Printed output is controlled by the PRINT option, with settings:

`description`	a description of the model,
`estimates`	estimates of the parameters,
`fittedvalues`	fitted values,
`summary`	summary (lack of fit etc.).

The SAVE parameter can save full detail of the RBF model; this can then be used by the RBDISPLAY directive to give further output, or by the RBPREDICT directive to form predictions. The estimated weights can be saved using the ESTIMATES parameter, and the fitted values vcan be saved by the FITTEDVALUES parameter.

Options: PRINT, RBTYPE, METRIC, SCALING, ALPHA, LAMBDA, TOLERANCE.

Parameters: Y, X, CENTRES. RBSCALING, FITTEDVALUES, ESTIMATES, EXIT, SAVE.

Method

RBFIT uses the function nagdmc_rbf from the Numerical Algorithms Group’s library of Data Mining Components (DMCs).

Action with `RESTRICT`

You can restrict the set of units used for the estimation by applying a restriction to the y-variate or any of the x-variates. If several of these are restricted, they must all be restricted to the same set of units.

Example

CAPTION 'RBFIT example',\
   'Predicting the grape cultivar from 13 wine attributes'; STYLE=meta,plain
SPLOAD   '%Data%/WinesTrain.gsh'; ISAVE=pData
CALC     NData,NUnits = NVALUES(pData,pData[1])
POINTER  [VALUES=pData[2...NData]] Attributes
GROUPS   Wine; FACTOR=Cultivar
TXCONST  [TEXT=AttrName] Attributes
POINTER  [NVALUES=AttrName] Mn
TABULATE [CLASS=Cultivar] Attributes[]; MEANS=Mn[]

VARIATE [VALUES=0.5,1,2,5,8,10,12,15,20,40] Lambda
CALC    ErrorRate = !s(*)*Lambda

FOR [INDEX=i] L = #Lambda
   RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=L] Y=Wine; X=Attributes; \
         CENTRES=Mn; FITTED=Fit
   "Predicted class is the closest integer 1...3"
   CALC  Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
   CALC  ErrorRate$[i] = 100*SUM(Cultivar /= Prediction)/NUnits
ENDFOR

PRINT Lambda,ErrorRate; DEC=3
CALC  iMin = MINPOSITION(ErrorRate)
CALC  BestLambda = Lambda$[iMin]
PRINT BestLambda; DEC=3

RBFIT     [PRINT=*; RBTYPE=linear; LAMBDA=BestLambda] Y=Wine; X=Attributes; \
          CENTRES=Mn; FITTED=Fit; SAVE=RBSave
RBDISPLAY [PRINT=description, estimates, fittedvalues, summary] RBSave

"Show misclassified counts"
CALC     Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
GROUPS   Prediction; FACTOR=PCultivar
TABULATE [CLASS=Cultivar,PCultivar; PRINT=counts]

"Try other models"
FOR [INDEX=i] Model = 'cubic','thinplate','gaussian','multiquadric',\
    'inversemultiquadric','cauchy'; lambda = 1.5,1,0.1,1,0.1,0.01
   RBFIT [PRINT=*; RBTYPE=#Model; LAMBDA=lambda] Y=Wine; X=Attributes; \
         CENTRES=Mn; FITTED=Fit
   CALC  Prediction = 1 + (Fit > 1.5) + (Fit > 2.5)
   CALC  ErrRate = 100*SUM(Cultivar /= Prediction)/NUnits
   IF i == 1
      SKIP  [FILE=output] 1
      PRINT [SQUASH=y;IPRINT=*] 'Model','Lambda','Error rate'; \
            JUST=left; FIELD=20,8,8
   ENDIF
   PRINT [SQUASH=y;IPRINT=*] Model,lambda,ErrRate; DEC=3; FIELD=20,8,8
ENDFOR

"Predictions from best linear model"
SPLOAD    '%Data%/WinesPred.gsh'; ISAVE=TestAttr
RBPREDICT X=TestAttr; PREDICTIONS=TFit; SAVE=RBSave
CALC      TPrediction = 1 + (TFit > 1.5) + (TFit > 2.5)
GROUPS    TPrediction; TCultivar
TABULATE  [CLASS=TCultivar; PRINT=counts]

Updated on March 11, 2022

Was this article helpful?

Yes No