Fits a multi-layer perceptron neural network.
|Controls fitted output (
||Number of functions in the hidden layer; no default, must be set|
||Type of activation function in the hidden layer (
||Type of activation function in the output layer (
||Multiplicative constant to use in the functions; default 1|
||Number of times to search for a good initial starting point for the optimization; default 5|
||Number of iterations to use to find a good starting point for the optimization; default 30|
||Variate containing three integers to control validation for early stopping; default
||Seed for random numbers to generate initial values for the free parameters; default 0|
||Maximum number of iterations of the conjugate-gradient algorithm; default 50|
||Validation data for the dependent variates|
||Validation data for the independent variates|
||Fitted values generated for each y-variate by the neural network|
||Value of the sum of squares objective function at the end of the optimization|
||Number of completed iterations of the conjugate-gradient algorithm|
||Saves the exit code|
||Saves details of the network and the estimated parameters|
A neural network is a method for describing a nonlinear relationship between a response variate supplied here by the
Y parameter, and a set of input variates supplied here in a pointer by the
X parameter. The type of neural network fitted by
NNFIT is a fully-connected feed-forward multi-layer perceptron with a single hidden layer. This network starts with a row of nodes, one for each input variable (i.e. x-variate), which are all connected to every node in the hidden layer. The nodes in the hidden layer are then all connected to the output node in the final, output layer. The number of nodes in the hidden layer is specified by the
The output value y is given by
y = ψ( Σk = 1…m wk φ( Σj = 1…d wjk xj – θ) – η)
|where d||is the number of input nodes (i.e. x-variates),|
|m||is the number of hidden nodes (
|xj||is value of the jth x-variate,|
|wjk||are weight parameters in the connections between the nodes in the input and hidden layers,|
|wk||are weight parameters in the connections between the nodes in the hidden and output layer,|
|θ||is the threshold value subtracted at the hidden layer,|
|η||is the threshold value subtracted at the single node in the output layer,|
|φ(.)||is the activation function applied at the hidden layer,|
|ψ(.)||is the activation function applied at the output layer.|
The activation functions for the hidden and outer layer are specified by the
OUTPUTMETHOD options, respectively, with settings:
||φ(z) = z (
||φ(z) = 1 / (1 + exp(-γz)),|
||φ(z) = tanh(γz),|
where the parameter γ is specified by the
GAIN option; the default setting is
Values for the free parameters in the multi-layer perceptron model are optimized by using a preconditioned, limited-memory quasi-Newton conjugate gradients method to minimize the objective (sum of squares) function equal to 0.5 times the average sum of squared deviation of the estimated y-values from the observed y-values.
Printed output is controlled by the
||a description of the network (number of input variables, nodes etc.),|
||estimates of the free parameters,|
||summary (numbers of iterations, objective function etc.).|
NTRIES option defines the number of times to search for a good initial starting point for the optimization (default 5). The
NSTARTITERATIONS option defines the number of iterations to use to find a good starting point for the optimization (default 30).
SEED option supplies a seed for the random numbers to generate initial values for the free parameters. The default of zero continues the existing sequence of random numbers if any have already been used in the current Genstat job. If none have yet been used, Genstat picks a seed at random.
MAXCYCLE option sets a limit on the number of iterations of the conjugate-gradient algorithm to use for the estimation (default 50).
To improve the accuracy of the neural-network approximations to new data records, it is usually desirable to stop the optimization before the value of the objective function reaches a global minimum on the training set. This method, which is known as early stopping, and can be performed by using a validation set of data records, specified by the
XVALIDATION parameters. The optimization is then halted when the sum of squares error function achieves a minimum over the validation set of data records which has not been used to estimate the values of the free parameters in the model. The
VALIDATIONOPTIONS option specifies a variate containing three integers to control validation for early stopping. The first integer defines the number of iterations of the optimizing function to complete before beginning validation; default 10. The second integer defines the number of iterations between consecutive validations; default 4. The third integer defines the number of iterations to continue validating beyond the current minimum of the objective function before stopping; default 16. This is to try to avoid the possibility of getting stuck at a local minimum. The variates in the
XVALIDATION pointer must be in the same order as the corresponding variates in the
The results of the fit, together with details about design of the neural network, can be saved using the
SAVE parameter. This can then be used in the
NNDISPLAY directive to display further output, or the
NNPREDICT directive to form predictions.
NNFIT uses the function
nagdmc_mlp from the Numerical Algorithms Group’s library of Data Mining Components (DMCs), which estimates the free parameters using a conjugate gradient method.
You can restrict the set of units used for the estimation by applying a restriction to the y-variate or any of the x-variates. If several of these are restricted, they must all be restricted to the same set of units. Similarly, you can restrict the set of units used for the validation by applying a restriction to the
YVALIDATION variate or any of the
Commands for: Data mining.
" Example NNFI-1: Fitting a multi-layer perceptron neural network." " This example fits a multi-layer perceptron neural network with five hidden layers, a hyperbolic activation function in the hidden layer and a linear activation function in the output layer." " The data are in a file called iris.GSH and contain the data from Fisher's Iris data set." SPLOAD [PRINT=*] '%GENDIR%/Data/iris.GSH' POINTER [VALUES=Sepal_Length,Sepal_Width,Petal_Length,Petal_Width] Measures CALC yval = NEWLEVELS(Species) NNFIT [PRINT=description,estimates,summary; NHIDDEN=5;\ HIDDENMETHOD=hyperbolictangent; OUTPUTMETHOD=linear; SEED=12]\ Y=yval; X=Measures