Fits a radial basis function model.

### Options

`PRINT` = string tokens |
Controls fitted output (`description` , `estimates` , `fittedvalues` , `summary` ); default `desc` , `esti` , `summ` |
---|---|

`RBTYPE` = string token |
Type of radial basis function (`linear` , `cubic` , `thinplate` , `gaussian` , `multiquadric` , `inversemultiquadric` , `cauchy` ); default `line` |

`METRIC` = string token |
How to calculate distances for the radial basis functions (`euclidean` , `cityblock` , `manhattan` , `pythagorean` ); default `eucl` |

`SCALING` = string token |
Type of scaling used to compute distances (`sd` , `mahalanobis` , `supplied` ); default `sd` |

`ALPHA` = scalar |
Specifies the value for the constant α, used to calculate radial distances for `RBTPYE` settings `multiquadric` , `inversemultiquadric` and `cauchy` ; default 1 |

`LAMBDA` = scalar |
Specifies the value of the penalty constant λ |

`TOLERANCE` = scalar |
Tolerance for setting eigenvalues equal to zero in the singular value decomposition; default 0.000001 |

### Parameters

`Y` = variates |
Response variates |
---|---|

`X` = pointers |
Independent variates |

`CENTRES` = pointers |
Centres of the radial basis functions for the dependent variates |

`RBSCALING` = scalars or variates |
Scaling parameters for the radial distance calculations when `SCALING=supplied` ; default 1 |

`FITTEDVALUES` = variates |
Fitted values generated for each y-variate by the model |

`ESTIMATES` = variates |
Saves the estimated model parameters |

`EXIT` = scalars |
Saves the exit code |

`SAVE` = pointers |
Saves details of the model and the estimated parameters for `RBDISPLAY` or `RBPREDICT` |

### Description

`RBFIT`

estimates the parameters of a radial basis function model. The response variate is supplied by the `Y`

parameter, and the independent (or x-) variates are supplied in a pointer by the `X`

parameter.

The model assumes that the y-value on each unit is related to the vector x of x-values (*x*_{1} … *x _{p}*) on that unit, according to the model

*y* = f(x) + ε

for some unknown function f() and noise ε drawn at random from a Normal distribution with zero mean and unit variance. A radial basis function (RBF) model approximates the function f() by a linear combination of *t* basis functions, giving an approximate fitted value *f* for the dependent value

*f* = ∑_{k=1…t} *w _{k} h_{k}* +

*w*

_{t+1}*b*

where *b* is a scalar intercept term and *h _{k}* is the value given by an RBF for a radial distance

*z*between x and a centre location c

_{k}*defined for the*

_{k}*k*th RBF.

The centre locations are supplied in a pointer by the `CENTRES`

parameter. This should have a variate for each x-variate, with a unit for each RBF.

The `METRIC`

option defines how the radial distances are calculated. The default setting, `euclidean`

, uses a scaled Euclidean distance

*z _{k}* = [(x – c

*) S*

_{k}^{-1}(x – c

*)′]*

_{k}^{1/2}

where the form of the scaling matrix S is controlled by the `SCALING`

option (see below). The `cityblock`

setting calculates the distance as

*z _{k}* = ∑

_{k=1…t}|

*x*–

_{j}*c*| /

_{kj}*s*

_{j}where *s _{j}* is the

*j*th diagonal element of the scaling matrix S.

`METRIC`

also has settings `pythagorean`

and `manhattan`

which act as synonyms of `euclidean`

and `cityblock`

.The available forms of the scaling matrix, and corresponding settings of the `SCALING`

option are as follows:

`sd` |
diagonal matrix containing the standard deviations of the x-variates (default), |
---|---|

`mahalanobis` |
variance-covariance matrix of the data values of x-variables (to give the Mahalanobis distance), |

`supplied` |
user-defined scaling parameters, supplied by the `RBSCALING` parameter. |

The `mahalanobis`

setting is available only for the `euclidean or pythagorean`

settings of the `METRIC`

option. The setting of `RBSCALING`

can be either a scalar or a variate, depending upon the parameters are the same or different over the x-variates; the values must all be greater then zero.

The form φ() of the radial basis functions is specified by the `RNTYPE`

option, by selecting one of the following settings:

`linear` |
φ(z) = z, |
---|---|

`cubic` |
φ(z) = z^{3}, |

`thinplate` |
φ(z) = z^{2} log_{e}(z), |

`gaussian` |
φ(z) = exp(-z^{2}), |

`multiquadric` |
φ(z) = √{z^{2}+ α^{2}}, |

`inversemultiquadric` |
φ(z) = 1 / √{z^{2}+ α^{2}}, |

`cauchy` |
φ(z) = 1 / (z^{2}+ α^{2}). |

The value of the constant α (which must be positive) is specified by the `ALPHA`

option, with a default of one.

The RBF model is fitted by estimating values for the weights *w _{k}*. This is done by minimizing the penalized (regularized) sum of squares error function:

(y – f)′ (y – f) + λ ∑_{k=1…t+1} *w _{k}*

^{2}

where the penalty constant λ must be specified by the `LAMBDA`

option.

The inverse-matrix calculations required during the fit are formed using a singular value decomposition. In the calculations, singular values that are less than the largest singular value multiplied by a tolerance are treated as zero. This tolerance is specified by the `TOLERANCE`

option; default 0.000001.

Printed output is controlled by the `PRINT`

option, with settings:

`description` |
a description of the model, |
---|---|

`estimates` |
estimates of the parameters, |

`fittedvalues` |
fitted values, |

`summary` |
summary (lack of fit etc.). |

The `SAVE`

parameter can save full detail of the RBF model; this can then be used by the `RBDISPLAY`

directive to give further output, or by the `RBPREDICT`

directive to form predictions. The estimated weights can be saved using the `ESTIMATES`

parameter, and the fitted values vcan be saved by the `FITTEDVALUES`

parameter.

Options: `PRINT`

, `RBTYPE`

, `METRIC`

, `SCALING`

, `ALPHA`

, `LAMBDA`

, `TOLERANCE`

.

Parameters: `Y`

, `X`

, `CENTRES`

. `RBSCALING`

, `FITTEDVALUES`

, `ESTIMATES`

, `EXIT`

, `SAVE`

.

### Method

`RBFIT`

uses the function `nagdmc_rbf`

from the Numerical Algorithms Group’s library of Data Mining Components (DMCs).

### Action with `RESTRICT`

You can restrict the set of units used for the estimation by applying a restriction to the y-variate or any of the x-variates. If several of these are restricted, they must all be restricted to the same set of units.

### See also

Directives: `RBDISPLAY`

, `RBPREDICT`

, `ASRULES`

, `NNFIT`

.

Procedures: `KNEARESTNEIGHBOURS`

, `RADIALSPLINE`

.

Commands for: Data mining.

### Example

CAPTION 'RBFIT example',\ 'Predicting the grape cultivar from 13 wine attributes'; STYLE=meta,plain SPLOAD '%Data%/WinesTrain.gsh'; ISAVE=pData CALC NData,NUnits = NVALUES(pData,pData[1]) POINTER [VALUES=pData[2...NData]] Attributes GROUPS Wine; FACTOR=Cultivar TXCONST [TEXT=AttrName] Attributes POINTER [NVALUES=AttrName] Mn TABULATE [CLASS=Cultivar] Attributes[]; MEANS=Mn[] VARIATE [VALUES=0.5,1,2,5,8,10,12,15,20,40] Lambda CALC ErrorRate = !s(*)*Lambda FOR [INDEX=i] L = #Lambda RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=L] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit "Predicted class is the closest integer 1...3" CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) CALC ErrorRate$[i] = 100*SUM(Cultivar /= Prediction)/NUnits ENDFOR PRINT Lambda,ErrorRate; DEC=3 CALC iMin = MINPOSITION(ErrorRate) CALC BestLambda = Lambda$[iMin] PRINT BestLambda; DEC=3 RBFIT [PRINT=*; RBTYPE=linear; LAMBDA=BestLambda] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit; SAVE=RBSave RBDISPLAY [PRINT=description, estimates, fittedvalues, summary] RBSave "Show misclassified counts" CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) GROUPS Prediction; FACTOR=PCultivar TABULATE [CLASS=Cultivar,PCultivar; PRINT=counts] "Try other models" FOR [INDEX=i] Model = 'cubic','thinplate','gaussian','multiquadric',\ 'inversemultiquadric','cauchy'; lambda = 1.5,1,0.1,1,0.1,0.01 RBFIT [PRINT=*; RBTYPE=#Model; LAMBDA=lambda] Y=Wine; X=Attributes; \ CENTRES=Mn; FITTED=Fit CALC Prediction = 1 + (Fit > 1.5) + (Fit > 2.5) CALC ErrRate = 100*SUM(Cultivar /= Prediction)/NUnits IF i == 1 SKIP [FILE=output] 1 PRINT [SQUASH=y;IPRINT=*] 'Model','Lambda','Error rate'; \ JUST=left; FIELD=20,8,8 ENDIF PRINT [SQUASH=y;IPRINT=*] Model,lambda,ErrRate; DEC=3; FIELD=20,8,8 ENDFOR "Predictions from best linear model" SPLOAD '%Data%/WinesPred.gsh'; ISAVE=TestAttr RBPREDICT X=TestAttr; PREDICTIONS=TFit; SAVE=RBSave CALC TPrediction = 1 + (TFit > 1.5) + (TFit > 2.5) GROUPS TPrediction; TCultivar TABULATE [CLASS=TCultivar; PRINT=counts]