Forms values for nodes of a classification tree (R.W. Payne).
Options
GROUPS = factor |
Groupings of the observations in the data set |
---|---|
TREE = tree |
Tree for which predictions and accuracy values are to be formed |
REPLACE = string token |
Whether to replace the values stored in the tree (yes , no ); default no |
PREDICTION = pointer |
New predictions for the nodes of the tree |
ACCURACY = pointer |
New accuracy values for the nodes of the tree |
REPLICATION = pointer |
New replication tables for the nodes of the tree |
Parameter
X = factors or variates |
Values of the factors or variates used in the tree for the new data set |
---|
Description
When pruning a classification tree, it is best to use “accuracy” figures that are derived from a different set or sets of data from that which was used to construct the tree. BCVALUES
allows these to be calculated, together with new predictions for the nodes of the tree.
The TREE
option specifies the tree for which the values are to be formed. The GROUPS
option specifies a factor defining the groupings of the observations in the new data set, and the X
parameter defines their levels for the factors or variates as used to construct the tree. You can set option REPLACE=yes
to use the new values to replace those already stored in the tree. Alternatively, you can use the PREDICTION
parameter to save the predictions, in a pointer. This has an element for each node of the tree (and with the same suffix as that node) pointing to a scalar storing the prediction for the node. Similarly, the ACCURACY
parameter saves the accuracies, in a pointer to a set of scalars, and the REPLICATION
parameter saves the replications of the groups at each node, in a pointer to a set of tables classified by the GROUPS
factor. You can use these later to replace the prediction and accuracy values in the original tree by
CALCULATE Tree[]['accuracy'] = ACCURACY[]
& Tree[]['prediction'] = PREDICTION[]
& Tree[]['replication'] = REPLICATION[]
Alternatively, you may want to combine them first with other estimates, for example to form bootstrapped estimates.
Options: GROUPS
, TREE
, REPLACE
, PREDICTION
, ACCURACY
, REPLICATION
.
Parameter: X
.
Method
BCVALUES
uses the standard Genstat tree functions to obtain the necessary information about the tree.
Action with RESTRICT
BCVALUES
takes account of any restrictions on the X
vectors or on GROUPS
.
See also
Procedures: BCLASSIFICATION
, BPRUNE
.
Commands for: Multivariate and cluster analysis.
Example
CAPTION 'BCVALUES example',\ !t('Calculator digit recognition problem as in Breiman et al.',\ '(1984, p.44); for more details see the BCLASSIFICATION example.');\ STYLE=meta,plain VARIATE xdefn[1...7] READ [PRINT=error] xdefn[1...7] 0 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 : "generate a set of random observations" SCALAR nsamples,seed; VALUE=50,876083 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight[] = ELEMENTS(xdefn[]; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light[] = MOD(truelight[] + error[]; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0 "check the data" CAPTION 'Check the data: mean values of each light for each digit.' FOR i=0...9 RESTRICT light[]; digit==i PRINT 'Digit',i; FIELD=6; DECIMALS=0; JUST=left & [ORIENT=across; SQUASH=yes] MEAN(light[]); FIELD=10 ENDFOR "number of each digit in the data set" RESTRICT light[] TABULATE [CLASS=digit; PRINT=count] CAPTION 'Mean error rate for each light.' PRINT [ORIENT=across] MEAN(error[]); DECIMALS=4 "form the classification tree" BCLASSIFICATION [PRINT=labelled; GROUPS=digit; TREE=tree]\ x1,x2,x3,x4,x5,x6,x7 CAPTION 'Prediction and accuracy values stored with the tree.' FOR pred=tree[]['prediction']; acc=tree[]['accuracy'] PRINT [SQUASH=yes] pred,acc; FIELD=25 ENDFOR "generate another set of random observations" SCALAR nsamples,seed; VALUE=500,728342 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight[] = ELEMENTS(xdefn[]; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light[] = MOD(truelight[] + error[]; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0 "form new prediction and accuracy values" POINTER [NVALUES=!t(y,x); VALUES=digit,!p(x1,x2,x3,x4,x5,x6,x7)] data BCVALUES [GROUPS=digit; TREE=tree; PREDICTION=prediction; ACCURACY=accuracy]\ x1,x2,x3,x4,x5,x6,x7 CAPTION 'New prediction and accuracy values (from another data set).' FOR pred=prediction[]; acc=accuracy[] PRINT [SQUASH=yes] pred,acc; FIELD=15 ENDFOR "prune the tree" BPRUNE [PRINT=table] tree; ACCURACY=accuracy; NEWTREE=pruned "use the 5th tree - renumber nodes" BCUT [RENUMBER=yes] pruned[5]; NEWTREE=tree "display the tree" BCDISPLAY [PRINT=summary,labelled] tree