1. Home
2. BPRUNE procedure

# BPRUNE procedure

Prunes a tree using minimal cost complexity (R.W. Payne).

### Option

`PRINT` = string tokens Controls printed output (`graph`, `table`, `monitoring`); default `tabl`

### Parameters

`TREE` = trees Trees to be pruned Accuracy values for the nodes of each tree; default is to use those stored with the tree Saves the trees generated during the pruning of each tree Accuracy of the pruned trees of each tree Number of terminal nodes in the pruned trees of each tree

### Description

The construction of a classification tree or a regression tree generally results in over fitting, that is it continues to extend the branches of the tree beyond the point that can be justified statistically. The solution is to prune the tree to remove the uninformative sub-branches.

The tree to be pruned is specified by the `TREE` parameter. `BPRUNE` assumes that there is an accuracy figure R(t) available for each node t of the tree. By default this is assumed to be stored with the tree itself, but you can specify other values using the `ACCURACY` parameter. This should be set to a pointer whose suffixes are the same as the numbers of the nodes in the tree, and whose elements are scalars storing the relevant accuracy values.

For a classification tree the accuracy measures the impurity of the subset of individuals at that node (how far it is from being homogeneous i.e. with individuals from a single group). For a regression tree it is the average squared distance of the values of the response variate from their mean for the subset of observations at that node. The accuracy R(T) of the whole tree T is the sum of the accuracies of its terminal nodes.

`BPRUNE` uses the principle of minimal cost complexity (Breiman et al. 1984, Chapter 3) to produce a sequence of pruned trees. At each stage it prunes at the node which is the weakest link. Define R(Tt) to be the accuracy of the subtree with root at node t, and nterm(t) to be its number of terminal nodes. The weakest link is then the node for which

(R(t) – R(Tt)) / (nterm(t) – 1)

is a minimum. The pruned trees can be saved, in a pointer, using the `NEWTREES` parameter. Their accuracies can be saved (in a variate) using the `RTPRUNED` parameter, and their numbers of terminal nodes can be saved (also in a variate) using the `NTERMINAL` parameter.

Printed output is controlled by the `PRINT` option, with settings:

    `graph` plots `RTPRUNED` against `NTERMINAL`; prints a table with `RTPRUNED` and `NTERMINAL`; provides monitoring information during the pruning.

The plot of `RTPRUNED` against `NTERMINAL` demonstates the trade-off between accuracy and complexity (number of terminal nodes). It should show an initial rapid decrease, followed by a long flat region, and then often a gradual increase. The aim is to select a tree that is accurate but not over-complex. One possibility is to take the tree at the point where the graph levels off. However, `RTPRUNED` contains only an estimate of the accuracy of the trees. So Breiman et al. (1984) recommend taking a tree a little above that (in fact at one standard error of `RTPRUNED` above the minimium point in the graph: see Chapters 3 and 11). In practice though a small amount of over-fitting should not be a problem, so the exact choice of pruned tree should not be crucial.

Option: `PRINT`.

Parameter: `TREE`, `ACCURACY`, `NEWTREES`, `RTPRUNED`, `NTERMINAL`.

### Method

`BPRUNE` uses the `BSCAN` function to move around the tree, function `BBELOW` to obtain the numbers of the nodes below each node, and directive `BCUT` to perform each pruning operation.

### Reference

Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classification and Regression Trees. Wadsworth, Monterey.

Procedures: `BCLASSIFICATION`, `BCVALUES`, `BREGRESSION`, `BRVALUES`.

### Example

```CAPTION 'BPRUNE example',!t('Water usage data (Draper & Smith 1981,',\
'Applied Regression Analysis, Wiley, New York).'); STYLE=meta,plain
58.8  7.107 21 129 3.067
65.2  6.373 22 141 2.828
70.9  6.796 22 153 2.891
77.4  9.208 20 166 2.994
79.3 14.792 25 193 3.082
81.0 14.564 23 189 3.898
71.9 11.964 20 175 3.502
63.9 13.526 23 186 3.060
54.5 12.656 20 190 3.211
39.5 14.119 20 187 3.286
44.5 16.691 22 195 3.542
43.6 14.571 19 206 3.125
56.0 13.619 22 198 3.022
64.7 14.575 22 192 2.922
73.0 14.556 21 191 3.950
78.9 18.573 21 200 4.488
79.4 15.618 22 200 3.295 :
"form the regression tree"
BREGRESSION [PRINT=indented; Y=water; TREE=tree]\
employ,opdays,product,temp
"prune the tree"
BPRUNE      [PRINT=table,graph] tree; NEWTREES=pruned
"use tree 6 - renumber nodes"
BCUT        [RENUMBER=yes] pruned; NEWTREE=tree
"display the tree"
BRDISPLAY   [PRINT=indented] tree
```
Updated on March 8, 2019