1. Home
  2. BCVALUES procedure

BCVALUES procedure

Forms values for nodes of a classification tree (R.W. Payne).

Options

GROUPS = factor Groupings of the observations in the data set
TREE = tree Tree for which predictions and accuracy values are to be formed
REPLACE = string token Whether to replace the values stored in the tree (yes, no); default no
PREDICTION = pointer New predictions for the nodes of the tree
ACCURACY = pointer New accuracy values for the nodes of the tree
REPLICATION = pointer New replication tables for the nodes of the tree

Parameter

X = factors or variates Values of the factors or variates used in the tree for the new data set

Description

When pruning a classification tree, it is best to use “accuracy” figures that are derived from a different set or sets of data from that which was used to construct the tree. BCVALUES allows these to be calculated, together with new predictions for the nodes of the tree.

The TREE option specifies the tree for which the values are to be formed. The GROUPS option specifies a factor defining the groupings of the observations in the new data set, and the X parameter defines their levels for the factors or variates as used to construct the tree. You can set option REPLACE=yes to use the new values to replace those already stored in the tree. Alternatively, you can use the PREDICTION parameter to save the predictions, in a pointer. This has an element for each node of the tree (and with the same suffix as that node) pointing to a scalar storing the prediction for the node. Similarly, the ACCURACY parameter saves the accuracies, in a pointer to a set of scalars, and the REPLICATION parameter saves the replications of the groups at each node, in a pointer to a set of tables classified by the GROUPS factor. You can use these later to replace the prediction and accuracy values in the original tree by

CALCULATE Tree[]['accuracy'] = ACCURACY[]

& Tree[]['prediction'] = PREDICTION[]

& Tree[]['replication'] = REPLICATION[]

Alternatively, you may want to combine them first with other estimates, for example to form bootstrapped estimates.

Options: GROUPS, TREE, REPLACE, PREDICTION, ACCURACY, REPLICATION.

Parameter: X.

Method

BCVALUES uses the standard Genstat tree functions to obtain the necessary information about the tree.

Action with RESTRICT

BCVALUES takes account of any restrictions on the X vectors or on GROUPS.

See also

Procedures: BCLASSIFICATION, BPRUNE.

Commands for: Multivariate and cluster analysis.

Example

CAPTION  'BCVALUES example',\
         !t('Calculator digit recognition problem as in Breiman et al.',\
         '(1984, p.44); for more details see the BCLASSIFICATION example.');\
         STYLE=meta,plain
VARIATE  xdefn[1...7]
READ     [PRINT=error] xdefn[1...7]
0 0 1 0 0 1 0
1 0 1 1 1 0 1
1 0 1 1 0 1 1
0 1 1 1 0 1 0
1 1 0 1 0 1 1
1 1 0 1 1 1 1
1 0 1 0 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 1
1 1 1 0 1 1 1 :

"generate a set of random observations"
SCALAR  nsamples,seed; VALUE=50,876083
VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit
CALC    rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1
&       truelight[] = ELEMENTS(xdefn[]; rdigit)
GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7]
CALC    light[] = MOD(truelight[] + error[]; 2)
FACTOR  [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0
FACTOR  [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0

"check the data"
CAPTION    'Check the data: mean values of each light for each digit.'
FOR i=0...9
  RESTRICT light[]; digit==i
  PRINT    'Digit',i; FIELD=6; DECIMALS=0; JUST=left
  &        [ORIENT=across; SQUASH=yes] MEAN(light[]); FIELD=10
ENDFOR
"number of each digit in the data set"
RESTRICT   light[]
TABULATE   [CLASS=digit; PRINT=count]
CAPTION    'Mean error rate for each light.'
PRINT      [ORIENT=across] MEAN(error[]); DECIMALS=4

"form the classification tree"
BCLASSIFICATION [PRINT=labelled; GROUPS=digit; TREE=tree]\
                x1,x2,x3,x4,x5,x6,x7
CAPTION 'Prediction and accuracy values stored with the tree.'
FOR pred=tree[]['prediction']; acc=tree[]['accuracy']
  PRINT [SQUASH=yes] pred,acc; FIELD=25
ENDFOR

"generate another set of random observations"
SCALAR  nsamples,seed; VALUE=500,728342
VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit
CALC    rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1
&       truelight[] = ELEMENTS(xdefn[]; rdigit)
GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7]
CALC    light[] = MOD(truelight[] + error[]; 2)
FACTOR  [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0
FACTOR  [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0

"form new prediction and accuracy values"
POINTER   [NVALUES=!t(y,x); VALUES=digit,!p(x1,x2,x3,x4,x5,x6,x7)] data
BCVALUES  [GROUPS=digit; TREE=tree; PREDICTION=prediction; ACCURACY=accuracy]\
          x1,x2,x3,x4,x5,x6,x7
CAPTION   'New prediction and accuracy values (from another data set).'
FOR pred=prediction[]; acc=accuracy[]
  PRINT   [SQUASH=yes] pred,acc; FIELD=15
ENDFOR

"prune the tree"
BPRUNE    [PRINT=table] tree; ACCURACY=accuracy; NEWTREE=pruned
"use the 5th tree - renumber nodes"
BCUT      [RENUMBER=yes] pruned[5]; NEWTREE=tree
"display the tree"
BCDISPLAY [PRINT=summary,labelled] tree
Updated on March 8, 2019

Was this article helpful?