1. Home
  2. BCIDENTIFY procedure

BCIDENTIFY procedure

Identifies specimens using a classification tree (R.W. Payne).

Options

PRINT = string tokens Controls printed output (identification, transcript); if PRINT is unset in an interactive run BCIDENTIFY will ask what you want to print, in a batch run the default is iden
TREE = tree Specifies the tree
IDENTIFICATION = text Saves the identification of each specimen
TERMINALNODES = pointer Saves the numbers of the terminal nodes reached by each specimen
PROBABILITIES = matrix Specimen × group matrix giving the probability that the specimens belong to each group
MVINCLUDE = string token Whether to provide identifications for specimens with missing or unavailable values of the x-variables (explanatory); default expl

Parameters

X = variates or factors Explanatory variables
VALUES = scalars, variates or texts Values to use for the explanatory variables; if these are unset for any variable, its existing values are used

Description

BCIDENTIFY identifies specimens using a classification tree, as constructed by the BCLASSIFICATION procedure. The tree can be saved from BCLASSIFICATION (using the TREE option of BCLASSIFICATION), and specified for BCIDENTIFY using its own TREE option. Alternatively, BCIDENTIFY will ask you for the identifier of the tree if you do not specify TREE when running interactively.

The characteristics of the specimens can be specified in the variates or factors listed by the X parameter. These must have identical names (and levels) to those used originally to construct the tree. You can use the VALUES parameter to supply new values, if those stored in any of the variates or factors are unsuitable.

If you do not set X when running interactively, BCIDENTIFY will ask you to supply the relevant characteristics in turn, as required by the tree. Otherwise, if an x-variable in the tree is not specified in the X parameter list, its values are assumed to be unavailable (i.e. missing).

By default, when the x-variable required at a node in the tree is unavailable or contains a missing value, BCIDENTIFY will follow all the branches from that node, and form a combined conclusion. You can set option MVINCLUDE=*, if you would prefer the identification to be missing.

The PRINT option controls printed output, with settings:

    identification prints the identifications obtained using the tree;
    transcript prints the observed characteristics when supplied in response to questions in an interactive run.

If you do not set PRINT in an interactive run, BCIDENTIFY will ask what you would like to print. In batch, the default is to print the identifications.

The IDENTIFICATION option allows you to save the identifications (in a text). The TERMINALNODES option allows you to save a pointer, with an element for each specimen, containing the numbers of the terminal nodes reached in the tree to provide its identification. This will be a scalar if the identification was derived from a single node, or a variate if it involved more than one (because several branches have been taken, as the result of a missing x-value). Finally, the PROBABILITIES option can save a specimen-by-group matrix giving the probability that the specimens belong to each group.

Options: PRINT, TREE, IDENTIFICATION, TERMINALNODES, PROBABILITIES, MVINCLUDE.

Parameters: X, VALUES.

Method

BCIDENTIFY uses BIDENTIFY to find the terminal nodes of the tree that correspond to the values of the explanatory variables.

Action with RESTRICT

Restrictions are ignored.

See also

Procedures: BCLASSIFICATION, BCDISPLAY, BCKEEP.

Commands for: Multivariate and cluster analysis.

Example

CAPTION 'BCIDENTIFY example',!t(\
        'Calculator digit recognition problem as in Breiman et al.',\
        '(1984, p.44). The assumption is that the digits of a calculator',\
        'are made up of 7 lines (as shown below), which may be missing for',\
        'any particular digit with probability 0.1:'); STYLE=meta,plain
SCALAR  Chan
ENQUIRE Chan; FILETYPE=output; OUTSTYLE=Style
OUTPUT  [STYLE=plain]
PRINT   !t(' -1- ','|   |','2   3','|   |',' -4- ',\
           '|   |','5   6','|   |',' -7- '); FIELD=20
OUTPUT  [STYLE=#Style]
VARIATE xdefn[1...7]
READ    [PRINT=error] xdefn[1...7]
0 0 1 0 0 1 0
1 0 1 1 1 0 1
1 0 1 1 0 1 1
0 1 1 1 0 1 0
1 1 0 1 0 1 1
1 1 0 1 1 1 1
1 0 1 0 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 1
1 1 1 0 1 1 1 :

"generate a set of random observations"
SCALAR  nsamples,seed; VALUE=50,876083
VARIATE [NVALUES=nsamples] light[1...7],truelight[1...7],error[1...7],rdigit
CALC    rdigit = MOD( INTEGER( URAND(seed; nsamples) * 10); 10) + 1
&       truelight[] = ELEMENTS(xdefn[]; rdigit)
GRANDOM [DISTRIBUTION=binomial; PROBABILITY=0.1; NVALUES=nsamples]error[1...7]
CALC    light[] = MOD(truelight[] + error[]; 2)
FACTOR  [LEVELS=!(0...9)] digit; VALUES=MOD(rdigit; 10); DECIMALS=0
FACTOR  [LEVELS=!(0,1)] x1,x2,x3,x4,x5,x6,x7; VALUES=light[]; DECIMALS=0

"form the classification tree"
BCLASSIFICATION [PRINT=*; GROUPS=digit; TREE=tree]\
                x1,x2,x3,x4,x5,x6,x7
"prune the tree"
BPRUNE    [PRINT=table] tree; NEWTREE=pruned
"use the 5th tree - renumber nodes"
BCUT      [RENUMBER=yes] pruned[5]; NEWTREE=tree

"display the tree"
BCDISPLAY [PRINT=labelled] tree

PRINT      'Check identification of the true representations of the digits.'
FACTOR     [LEVELS=!(0,1); NVALUES=10] x1,x2,x3,x4,x5,x6,x7; VALUES=xdefn[]
BCIDENTIFY [PRINT=*; TREE=tree; IDENTIFICATION=identification]\
           x1,x2,x3,x4,x5,x6,x7
TEXT       [VALUES='Digit 1:','Digit 2:','Digit 3:','Digit 4:','Digit 5:',\
           'Digit 6:','Digit 7:','Digit 8:','Digit 9:','Digit 0:'] name
PRINT      name,identification; FIELD=15
Updated on June 20, 2019

Was this article helpful?