DSEPARATIONPLOT procedure

Creates a separation plot for visualising the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome (V.M. Cave).

Options

`METHOD` = string token	Method used to plot the predicted probabilities (`rectangles`, `lines`, `rbands`, `lbands`); default `rect`
`PLOT` = string tokens	Information to be plotted on the graph (`key`, `traceline`, `expectednumber`); default `key`, `trac`, `expe` when `METHOD=rectangles` or `lines`, and `key` when `METHOD=rbands` or `lbands`
`SUCCESSLEVEL` = string token	Specifies which level corresponds to success when `GROUPS` supplies a factor with 2 levels (`first`, `second`); default `seco`
`LINEORDER` = string token	If `METHOD=lines`, whether the failures or successes are plotted first (`failurefirst`, `successfirst`); default `fail`
`NGROUPS` = scalar	Number of discrete bands used to group the predicted probabilities when `METHOD=rbands` or `lbands`; default 10
`TIES` = string token	How tied data values in `PROBABILITIES` are handled when `METHOD=rectangles` or `lines` (`permute`, `same`); default `perm`
`SEED` = scalar	Seed for random number generator used to permute the tied data; default 0
`COLOURS` = variate or text	The two colours used to plot the predicted probabilities
`THICKNESS` = scalar	Thickness of the line for plotting the predicted probabilities when `METHOD=lines` or `lbands`; default 1
`BACKGROUND` = scalar or text	Colour of the background when `METHOD=lines` or `lbands`; default `ligh`
`BORDER` = string token	Whether to draw borders around the rectangles when `METHOD=rectangles` or `rbands` (`yes`, `no`); default `no`
`USEPENS` = string token	Whether to use the current pen definitions of pens 2 and 3 for plotting the `traceline` and `expectednumber`. respectively (`yes`, `no`); default `no`
`SAVE` = rsave or pointer	Regression or HGLM save structure to provide the data if `PROBABILITIES`, `GROUPS`, `NSUCCESSES` and `NBINOMIAL` are not specified

Parameters

`PROBABILITIES` = variate, matrix or pointer	Variate containing probabilities of success for a binary outcome (i.e. for binary or binomial data), or, for a polytomous outcome, a matrix containing probabilities of membership in each group
`GROUPS` = variate or factor	Actual outcome, when `NSUCCESSES` and `NBINOMIAL` are not supplied
`NSUCCESSES` = variate	Number of successes when `PROBABILITIES` supplies predicted probabilities from binomial data
*`NBINOMIAL` = variate* or scalar**	Number of trials when `PROBABILITIES` supplies predicted probabilities from binomial data
`TITLE` = text	Title for the plot; default generates the title automatically
`XTITLE` = text	Title for the x-axis; default * i.e. none

Description

The DSEPARATIONPLOT procedure creates a separation plot, which is a graphical approach for assessing the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome. A separation plot provides a visualisation of a model’s ability to predict occurrences of the event of interest (i.e. successes) with high probability, and non-occurrences (i.e. failures) with low probability. The procedure can accommodate models for binary, binomial and polytomous data.

The predicted probabilities are supplied using the PROBABILITIES parameter. For models for binary or binomial data, the predicted probabilities of success are supplied in a variate. For models for polytomous data, the predicted probabilities of membership to each group can be supplied in a matrix or a pointer to variates.

The actual outcome is defined using the GROUPS parameter for binary and polytomous data, and the NSUCCESSES and NBINOMIAL parameters for binomial data. For models for binary data, GROUPS must supply either a binary variate (i.e. a variate containing only zeros or ones) or a factor with two levels. If a binary variate is supplied, one corresponds to success in relation to PROBABILITIES. Alternatively, if a factor is supplied the default is that the second level corresponds to success. You can set option SUCCESSLEVEL=first to specify that the first level corresponds to success instead.

You can use the SAVE option to supply a save structure, from a regression or an HGLM analysis, to provide the data if the PROBABILITIES, GROUPS, NSUCCESSES and NBINOMIAL are not specified. The analyses must involve either a generalized linear model with a binomial distribution or an HGLM with a binomial distribution for the mean model. If neither those parameters nor SAVE are specified, the data are taken from the most recent regression analysis.

For models for polytomous data, GROUPS must supply a factor with the same number of levels as the columns in the matrix supplied by PROBABILITIES. The first level of the GROUPS factor then corresponds to the first column of the matrix, the second level to the second column, and so on (i.e. the predicted probabilities of membership to the group that correspond to the i^th level of the factor are in the i^th column of the matrix supplied by PROBABILITIES.)

For models for binomial data, NSUCCESSES must supply a variate giving the number of successes, and NBINOMIAL must supply either a scalar or a variate giving the number of trials. The GROUPS parameter is then ignored.

The predicted probabilities can be plotted as rectangles, lines or in banded groups. This is specified using the METHOD option with the following settings.

`rectangles`	the predicted probabilities, ordered from smallest to largest, are plotted as rectangles that are coloured according to whether or not the observation corresponds to a success (i.e. an actual occurrence of the event of interest); this is the default.
`lines`	this is similar to `rectangles`, except that line segments are plotted instead of rectangles.
`rbands`	a separate graph is drawn for each actual outcome (i.e. success/failure for dichotomous data or each group for polytomous data) with the predicted probabilities of that outcome ordered from smallest to largest, and plotted as rectangles. The rectangles are coloured using a graduated band of colours formed by grouping the predicted probabilities into distinct bands.
`lbands`	this is similar to `rbands`, except that line segments are plotted instead of rectangles.

The COLOURS option defines the colours that are used to plot the predicted probabilities. It must supply two colours, either in a variate (containing two numbers defining the colours using the RGB system) or in a text (containing the names of two of Genstat’s pre-defined colours; see PEN for details). When METHOD=rectangles or lines, the first colour corresponds to failures (i.e. non-occurrences of the event of interest) and the second to successes (i.e. occurrences of the event of interest); defaults are a shade of pink (RGB value = 12917629) and a shade of green (RBG value = 5083681). When METHOD=rbands or lbands, the two colours define the start and end colours values used by DCOLOURS to form a linear band of graduated colours, with the first colour corresponding the lowest probability band, and the second to the highest probability band; defaults are a pale shade of yellow (RGB value = 16777011) and a dark shade of red (RBG value = 15073280). The number of discrete bands (and therefore colours) used to group the predicted probabilities into bands is specified using the NGROUPS option. By default the predicted probabilities are grouped into 10 distinct bands; [0,0.1), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,0.9), [0.9,1]. (Note: the highest probability band is always a closed interval. All other probability bands are right half-open intervals.)

With large data sets, the lines on a separation plot may overlap. The THICKNESS option can be used modify the thickness of lines plotted when METHOD=lines or lband, by specifying a value by which the standard thickness is to be multiplied; default 1.

When METHOD=lines, the default is to plot the failures (i.e. non-occurrences) before the successes (i.e. occurrences of the event of interest). The success lines may then overlap and obscure the failure lines. Alternatively, you can set option LINEORDER=success to plot the successes lines first. The failures may then obscure the successes.

The BACKGROUND specifies the background colour when METHOD=lines or lband; default lightgray. Either a scalar (defining the colour using the RGB system) or a text (containing the name of a pre-defined colour; see PEN for details) may be supplied.

By default, borders are not drawn around the rectangles when METHOD=rectangles or rbands. However, you include borders by setting option BORDER=yes. Their appearance can be modified by altering the settings of pen -7 (see PEN for details).

With METHOD=rectangles or lines, the individual predicted probabilities are plotted in order from smallest to largest. The TIES option controls how tied probabilities are handled. The default, TIES=permute, randomly permutes the order in which the tied values are plotted, thereby breaking up any pre-existing patterns that may distort the appearance of the separation plot. Alternatively, TIES=same plots the tied values in the same order as they appear in PROBABILITIES.

The SEED option specifies the seed for the random-number generator, used by RANDOMIZE, to make the permutations when TIES=permute. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If you use the same (non-zero) seed more than once, the tied values will be permuted in the same way, and hence you will get same separation plot.

The PLOT option controls what additional information is plotted on the graph, with the following settings.

`key`	adds a key to the graph.
`traceline`	adds a line graph of the ordered predicted probabilities when `METHOD=rectangles` or `lines`.
`expectednumber`	adds a symbol (default star) denoting the expected number of successes when `METHOD=rectangles` or `lines`. This is calculated as the sum of the predicted probabilities for the occurrence of the event of interest (i.e. the sum of the predicted probabilities of success).

By default, the key is plotted. Also, when METHOD=rectangles or lines, the traceline and the expectednumber are plotted by default. You can suppress any additional information by setting SHOW=*.

You can set option USEPENS=yes to use the settings of pens 2 and 3 for the line drawn by SHOW=traceline and for the symbol added by SHOW=expectednumber, respectively. You can thus modify their appearance by modifying the settings of these pens prior to using DSEPARATIONPLOT. (See PEN for details.)

The TITLE and XTITLE parameters can supply an overall title and a x-axis title for the separation plot, respectively. If no overall title is supplied, a suitable title is generated automatically. To omit the title, a blank string can be supplied, i.e. TITLE=' '. By default, the x-axis title is not displayed.

Options: METHOD, PLOT, SUCCESSLEVEL, LINEORDER, NGROUPS, TIES, SEED, COLOURS, THICKNESS, BACKGROUND, BORDER, USEPENS, SAVE.
Parameters: PROBABILITIES, GROUPS, NSUCCESSES, NBINOMIAL, TITLE, XTITLE.

Method

DSEPARATIONPLOT uses the methods described by Greenhill et al. (2011).

Action with `RESTRICT`

DSEPARATIONPLOT will work with restricted PROBABILITIES, NSUCCESSES or NBINOMIAL variates and a restricted GROUPS factor or variate. However, if more than one is restricted, they must be restricted in the same way. Note that the unrestricted length of all of the data variates and factors must be the same.

References

Greenhill, B., Ward, M.B. & Sacks, A. (2011). The separation plot: a visual method for evaluating the fit of binary models. American Journal of Political Science 55, 990-1002.

Example

CAPTION 'DSEPARATION example'; STYLE=meta

CAPTION 'Binary data','Goorin et al. (1987)',\
         !T('(The Guide to the Genstat Command Language, Part 2: Statistics',\
              'Example 3.5.2)'); \
         STYLE=major,plain,plain

FACTOR Li,Sex,Aop
READ Li,Sex,Aop,Free
1 1 1 1  2 1 1 1  2 2 1 1  2 2 2 0
1 1 1 1  2 1 1 1  2 2 1 0  2 2 2 0
1 1 1 1  2 1 1 1  2 2 1 0  2 2 2 0
1 1 2 1  2 1 2 1  2 2 1 0  2 2 2 0
1 1 2 1  2 1 2 1  2 2 1 0  2 2 2 0
1 2 1 1  2 1 2 1  2 2 2 1  2 2 2 0
1 2 1 1  2 1 2 0  2 2 2 1  2 2 2 0
1 2 1 1  2 1 2 0  2 2 2 1  2 2 2 0
1 2 1 1  2 2 1 1  2 2 2 1  2 2 2 0
1 2 2 1  2 2 1 1  2 2 2 1  2 2 2 0
2 1 1 1  2 2 1 1  2 2 2 1     
2 1 1 1  2 2 1 1  2 2 2 0 :

MODEL [DISTRIBUTION=binomial; LINK=logit] Free; NBINOMIAL=1
TERMS [FACT=9]  Sex+Aop+Li
FIT Sex+Aop+Li
RKEEP FITTEDVALUES=fittedFree
DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=fittedFree; GROUPS=Free
DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=fittedFree; GROUPS=Free
DSEPARATIONPLOT [METHOD=lbands] PROBABILITIES=fittedFree; GROUPS=Free
DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=fittedFree; GROUPS=Free

CAPTION 'Binomial data','Finney (1971) analgesic drug data',\
         !T('(A Guide to Regression, Nonlinear and Generalized Linear Models in',\
            'Genstat, Section 3.4)'); \
         STYLE=major,plain,plain

SPLOAD   [PRINT=*] '%gendir%/data/Drug.gsh'

CALCULATE LogDose = LOG(Dose)
MODEL [DISTRIBUTION=binomial; LINK=probit; DISPERSION=1] R; NBINOMIAL=N
TERMS [FACT=9] LogDose*Drug
FIT [PRINT=*] LogDose*Drug
RKEEP FITTEDVALUES=fittedR
CALCULATE estprob = fittedR/N
DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N
DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N
DSEPARATIONPLOT [METHOD=lbands; THICKNESS=0.001] PROBABILITIES=estprob; \
                NSUCCESSES=R; NBINOMIAL=N
DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N

Updated on February 7, 2023

Was this article helpful?

Yes No