Identifies specimens using a classification tree (R.W. Payne).
|Controls printed output (
||Specifies the tree|
||Saves the identification of each specimen|
||Saves the numbers of the terminal nodes reached by each specimen|
||Specimen × group matrix giving the probability that the specimens belong to each group|
||Whether to provide identifications for specimens with missing or unavailable values of the x-variables (
||Values to use for the explanatory variables; if these are unset for any variable, its existing values are used|
BCIDENTIFY identifies specimens using a classification tree, as constructed by the
BCLASSIFICATION procedure. The tree can be saved from
BCLASSIFICATION (using the
TREE option of
BCLASSIFICATION), and specified for
BCIDENTIFY using its own
TREE option. Alternatively,
BCIDENTIFY will ask you for the identifier of the tree if you do not specify
TREE when running interactively.
The characteristics of the specimens can be specified in the variates or factors listed by the
X parameter. These must have identical names (and levels) to those used originally to construct the tree. You can use the
VALUES parameter to supply new values, if those stored in any of the variates or factors are unsuitable.
If you do not set
X when running interactively,
BCIDENTIFY will ask you to supply the relevant characteristics in turn, as required by the tree. Otherwise, if an x-variable in the tree is not specified in the
X parameter list, its values are assumed to be unavailable (i.e. missing).
By default, when the x-variable required at a node in the tree is unavailable or contains a missing value,
BCIDENTIFY will follow all the branches from that node, and form a combined conclusion. You can set option
MVINCLUDE=*, if you would prefer the identification to be missing.
||prints the identifications obtained using the tree;|
||prints the observed characteristics when supplied in response to questions in an interactive run.|
If you do not set
BCIDENTIFY will ask what you would like to print. In batch, the default is to print the identifications.
IDENTIFICATION option allows you to save the identifications (in a text). The
TERMINALNODES option allows you to save a pointer, with an element for each specimen, containing the numbers of the terminal nodes reached in the tree to provide its identification. This will be a scalar if the identification was derived from a single node, or a variate if it involved more than one (because several branches have been taken, as the result of a missing x-value). Finally, the
PROBABILITIES option can save a specimen-by-group matrix giving the probability that the specimens belong to each group.
BIDENTIFY to find the terminal nodes of the tree that correspond to the values of the explanatory variables.
Restrictions are ignored.
Commands for: Multivariate and cluster analysis.
CAPTION 'BCIDENTIFY example',!t(\ 'Calculator digit recognition problem as in Breiman et al.',\ '(1984, p.44). The assumption is that the digits of a calculator',\ 'are made up of 7 lines (as shown below), which may be missing for',\ 'any particular digit with probability 0.1:'); STYLE=meta,plain SCALAR Chan ENQUIRE Chan; FILETYPE=output; OUTSTYLE=Style OUTPUT [STYLE=plain] PRINT !t(' -1- ','| |','2 3','| |',' -4- ',\ '| |','5 6','| |',' -7- '); FIELD=20 OUTPUT [STYLE=#Style] VARIATE xdefn[1...7] READ [PRINT=error] xdefn[1...7] 0 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 : "generate a set of random observations" SCALAR nsamples,seed; VALUE=50,876083 VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit CALC rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1 & truelight = ELEMENTS(xdefn; rdigit) GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7] CALC light = MOD(truelight + error; 2) FACTOR [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0 FACTOR [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light; DECIMALS=0 "form the classification tree" BCLASSIFICATION [PRINT=*; GROUPS=digit; TREE=tree]\ x1,x2,x3,x4,x5,x6,x7 "prune the tree" BPRUNE [PRINT=table] tree; NEWTREE=pruned "use the 5th tree - renumber nodes" BCUT [RENUMBER=yes] pruned; NEWTREE=tree "display the tree" BCDISPLAY [PRINT=labelled] tree PRINT 'Check identification of the true representations of the digits.' FACTOR [LEVELS=!(0,1); NVALUES=10] x1,x2,x3,x4,x5,x6,x7; VALUES=xdefn BCIDENTIFY [PRINT=*; TREE=tree; IDENTIFICATION=identification]\ x1,x2,x3,x4,x5,x6,x7 TEXT [VALUES='Digit 1:','Digit 2:','Digit 3:','Digit 4:','Digit 5:',\ 'Digit 6:','Digit 7:','Digit 8:','Digit 9:','Digit 0:'] name PRINT name,identification; FIELD=15