Creates a separation plot for visualising the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome (V.M. Cave).
Options
METHOD = string token |
Method used to plot the predicted probabilities ( |
PLOT = string tokens |
Information to be plotted on the graph (key , traceline , expectednumber ); default key , trac , expe when METHOD=rectangles or lines , and key when METHOD=rbands or lbands |
SUCCESSLEVEL = string token |
Specifies which level corresponds to success when GROUPS supplies a factor with 2 levels (first , second ); default seco |
LINEORDER = string token |
If METHOD=lines , whether the failures or successes are plotted first (failurefirst , successfirst ); default fail |
NGROUPS = scalar |
Number of discrete bands used to group the predicted probabilities when METHOD=rbands or lbands ; default 10 |
TIES = string token |
How tied data values in PROBABILITIES are handled when METHOD=rectangles or lines (permute , same ); default perm |
SEED = scalar |
Seed for random number generator used to permute the tied data; default 0 |
COLOURS = variate or text |
The two colours used to plot the predicted probabilities |
THICKNESS = scalar |
Thickness of the line for plotting the predicted probabilities when METHOD=lines or lbands ; default 1 |
BACKGROUND = scalar or text |
Colour of the background when METHOD=lines or lbands ; default ligh |
BORDER = string token |
Whether to draw borders around the rectangles when METHOD=rectangles or rbands (yes , no ); default no |
USEPENS = string token |
Whether to use the current pen definitions of pens 2 and 3 for plotting the traceline and expectednumber . respectively (yes , no ); default no |
SAVE = rsave or pointer |
Regression or HGLM save structure to provide the data if PROBABILITIES , GROUPS , NSUCCESSES and NBINOMIAL are not specified |
Parameters
PROBABILITIES = variate, matrix or pointer |
Variate containing probabilities of success for a binary outcome (i.e. for binary or binomial data), or, for a polytomous outcome, a matrix containing probabilities of membership in each group |
GROUPS = variate or factor |
Actual outcome, when NSUCCESSES and NBINOMIAL are not supplied |
NSUCCESSES = variate |
Number of successes when PROBABILITIES supplies predicted probabilities from binomial data |
NBINOMIAL = variate or scalar |
Number of trials when PROBABILITIES supplies predicted probabilities from binomial data |
TITLE = text |
Title for the plot; default generates the title automatically |
XTITLE = text |
Title for the x-axis; default * i.e. none |
Description
The DSEPARATIONPLOT
procedure creates a separation plot, which is a graphical approach for assessing the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome. A separation plot provides a visualisation of a model’s ability to predict occurrences of the event of interest (i.e. successes) with high probability, and non-occurrences (i.e. failures) with low probability. The procedure can accommodate models for binary, binomial and polytomous data.
The predicted probabilities are supplied using the PROBABILITIES
parameter. For models for binary or binomial data, the predicted probabilities of success are supplied in a variate. For models for polytomous data, the predicted probabilities of membership to each group can be supplied in a matrix or a pointer to variates.
The actual outcome is defined using the GROUPS
parameter for binary and polytomous data, and the NSUCCESSES
and NBINOMIAL
parameters for binomial data. For models for binary data, GROUPS
must supply either a binary variate (i.e. a variate containing only zeros or ones) or a factor with two levels. If a binary variate is supplied, one corresponds to success in relation to PROBABILITIES
. Alternatively, if a factor is supplied the default is that the second level corresponds to success. You can set option SUCCESSLEVEL=first
to specify that the first level corresponds to success instead.
You can use the SAVE
option to supply a save structure, from a regression or an HGLM analysis, to provide the data if the PROBABILITIES
, GROUPS
, NSUCCESSES
and NBINOMIAL
are not specified. The analyses must involve either a generalized linear model with a binomial distribution or an HGLM with a binomial distribution for the mean model. If neither those parameters nor SAVE
are specified, the data are taken from the most recent regression analysis.
For models for polytomous data, GROUPS
must supply a factor with the same number of levels as the columns in the matrix supplied by PROBABILITIES
. The first level of the GROUPS
factor then corresponds to the first column of the matrix, the second level to the second column, and so on (i.e. the predicted probabilities of membership to the group that correspond to the ith level of the factor are in the ith column of the matrix supplied by PROBABILITIES
.)
For models for binomial data, NSUCCESSES
must supply a variate giving the number of successes, and NBINOMIAL
must supply either a scalar or a variate giving the number of trials. The GROUPS
parameter is then ignored.
The predicted probabilities can be plotted as rectangles, lines or in banded groups. This is specified using the METHOD
option with the following settings.
rectangles |
the predicted probabilities, ordered from smallest to largest, are plotted as rectangles that are coloured according to whether or not the observation corresponds to a success (i.e. an actual occurrence of the event of interest); this is the default. |
lines |
this is similar to rectangles , except that line segments are plotted instead of rectangles. |
rbands |
a separate graph is drawn for each actual outcome (i.e. success/failure for dichotomous data or each group for polytomous data) with the predicted probabilities of that outcome ordered from smallest to largest, and plotted as rectangles. The rectangles are coloured using a graduated band of colours formed by grouping the predicted probabilities into distinct bands. |
lbands |
this is similar to rbands , except that line segments are plotted instead of rectangles. |
The COLOURS
option defines the colours that are used to plot the predicted probabilities. It must supply two colours, either in a variate (containing two numbers defining the colours using the RGB system) or in a text (containing the names of two of Genstat’s pre-defined colours; see PEN
for details). When METHOD=rectangles
or lines
, the first colour corresponds to failures (i.e. non-occurrences of the event of interest) and the second to successes (i.e. occurrences of the event of interest); defaults are a shade of pink (RGB value = 12917629) and a shade of green (RBG value = 5083681). When METHOD=rbands
or lbands
, the two colours define the start and end colours values used by DCOLOURS
to form a linear band of graduated colours, with the first colour corresponding the lowest probability band, and the second to the highest probability band; defaults are a pale shade of yellow (RGB value = 16777011) and a dark shade of red (RBG value = 15073280). The number of discrete bands (and therefore colours) used to group the predicted probabilities into bands is specified using the NGROUPS
option. By default the predicted probabilities are grouped into 10 distinct bands; [0,0.1), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,0.9), [0.9,1]. (Note: the highest probability band is always a closed interval. All other probability bands are right half-open intervals.)
With large data sets, the lines on a separation plot may overlap. The THICKNESS
option can be used modify the thickness of lines plotted when METHOD=lines
or lband
, by specifying a value by which the standard thickness is to be multiplied; default 1.
When METHOD=lines
, the default is to plot the failures (i.e. non-occurrences) before the successes (i.e. occurrences of the event of interest). The success lines may then overlap and obscure the failure lines. Alternatively, you can set option LINEORDER=success
to plot the successes lines first. The failures may then obscure the successes.
The BACKGROUND
specifies the background colour when METHOD=lines
or lband
; default lightgray
. Either a scalar (defining the colour using the RGB system) or a text (containing the name of a pre-defined colour; see PEN
for details) may be supplied.
By default, borders are not drawn around the rectangles when METHOD=rectangles
or rbands
. However, you include borders by setting option BORDER=yes
. Their appearance can be modified by altering the settings of pen -7 (see PEN
for details).
With METHOD=rectangles
or lines
, the individual predicted probabilities are plotted in order from smallest to largest. The TIES
option controls how tied probabilities are handled. The default, TIES=permute
, randomly permutes the order in which the tied values are plotted, thereby breaking up any pre-existing patterns that may distort the appearance of the separation plot. Alternatively, TIES=same
plots the tied values in the same order as they appear in PROBABILITIES
.
The SEED
option specifies the seed for the random-number generator, used by RANDOMIZE
, to make the permutations when TIES=permute
. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If you use the same (non-zero) seed more than once, the tied values will be permuted in the same way, and hence you will get same separation plot.
The PLOT
option controls what additional information is plotted on the graph, with the following settings.
key |
adds a key to the graph. |
traceline |
adds a line graph of the ordered predicted probabilities when METHOD=rectangles or lines . |
expectednumber |
adds a symbol (default star) denoting the expected number of successes when METHOD=rectangles or lines . This is calculated as the sum of the predicted probabilities for the occurrence of the event of interest (i.e. the sum of the predicted probabilities of success). |
By default, the key
is plotted. Also, when METHOD=rectangles
or lines
, the traceline
and the expectednumber
are plotted by default. You can suppress any additional information by setting SHOW=*
.
You can set option USEPENS=yes
to use the settings of pens 2 and 3 for the line drawn by SHOW=traceline
and for the symbol added by SHOW=expectednumber
, respectively. You can thus modify their appearance by modifying the settings of these pens prior to using DSEPARATIONPLOT
. (See PEN
for details.)
The TITLE
and XTITLE
parameters can supply an overall title and a x-axis title for the separation plot, respectively. If no overall title is supplied, a suitable title is generated automatically. To omit the title, a blank string can be supplied, i.e. TITLE=' '
. By default, the x-axis title is not displayed.
Options: METHOD
, PLOT
, SUCCESSLEVEL
, LINEORDER
, NGROUPS
, TIES
, SEED
, COLOURS
, THICKNESS
, BACKGROUND
, BORDER
, USEPENS
, SAVE
.
Parameters: PROBABILITIES
, GROUPS
, NSUCCESSES
, NBINOMIAL
, TITLE
, XTITLE
.
Method
DSEPARATIONPLOT
uses the methods described by Greenhill et al. (2011).
Action with RESTRICT
DSEPARATIONPLOT
will work with restricted PROBABILITIES
, NSUCCESSES
or NBINOMIAL
variates and a restricted GROUPS
factor or variate. However, if more than one is restricted, they must be restricted in the same way. Note that the unrestricted length of all of the data variates and factors must be the same.
References
Greenhill, B., Ward, M.B. & Sacks, A. (2011). The separation plot: a visual method for evaluating the fit of binary models. American Journal of Political Science 55, 990-1002.
See also
Directive: MODEL
Commands for: Regression analysis.
Example
CAPTION 'DSEPARATION example'; STYLE=meta CAPTION 'Binary data','Goorin et al. (1987)',\ !T('(The Guide to the Genstat Command Language, Part 2: Statistics',\ 'Example 3.5.2)'); \ STYLE=major,plain,plain FACTOR Li,Sex,Aop READ Li,Sex,Aop,Free 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 2 1 1 2 1 2 1 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 2 1 1 2 2 2 1 2 2 2 0 1 2 2 1 2 2 1 1 2 2 2 1 2 2 2 0 2 1 1 1 2 2 1 1 2 2 2 1 2 1 1 1 2 2 1 1 2 2 2 0 : MODEL [DISTRIBUTION=binomial; LINK=logit] Free; NBINOMIAL=1 TERMS [FACT=9] Sex+Aop+Li FIT Sex+Aop+Li RKEEP FITTEDVALUES=fittedFree DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lbands] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=fittedFree; GROUPS=Free CAPTION 'Binomial data','Finney (1971) analgesic drug data',\ !T('(A Guide to Regression, Nonlinear and Generalized Linear Models in',\ 'Genstat, Section 3.4)'); \ STYLE=major,plain,plain SPLOAD [PRINT=*] '%gendir%/data/Drug.gsh' CALCULATE LogDose = LOG(Dose) MODEL [DISTRIBUTION=binomial; LINK=probit; DISPERSION=1] R; NBINOMIAL=N TERMS [FACT=9] LogDose*Drug FIT [PRINT=*] LogDose*Drug RKEEP FITTEDVALUES=fittedR CALCULATE estprob = fittedR/N DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lbands; THICKNESS=0.001] PROBABILITIES=estprob; \ NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N