Creates a separation plot for visualising the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome (V.M. Cave).
Method used to plot the predicted probabilities (
||Information to be plotted on the graph (
||Specifies which level corresponds to success when
||Number of discrete bands used to group the predicted probabilities when
||How tied data values in
||Seed for random number generator used to permute the tied data; default 0|
||The two colours used to plot the predicted probabilities|
||Thickness of the line for plotting the predicted probabilities when
||Colour of the background when
||Whether to draw borders around the rectangles when
||Whether to use the current pen definitions of pens 2 and 3 for plotting the
||Regression or HGLM save structure to provide the data if
||Variate containing probabilities of success for a binary outcome (i.e. for binary or binomial data), or, for a polytomous outcome, a matrix containing probabilities of membership in each group|
||Actual outcome, when
||Number of successes when
||Number of trials when
||Title for the plot; default generates the title automatically|
||Title for the x-axis; default * i.e. none|
DSEPARATIONPLOT procedure creates a separation plot, which is a graphical approach for assessing the fit of a model with a dichotomous (i.e. binary) or polytomous (i.e. multi-categorical) outcome. A separation plot provides a visualisation of a model’s ability to predict occurrences of the event of interest (i.e. successes) with high probability, and non-occurrences (i.e. failures) with low probability. The procedure can accommodate models for binary, binomial and polytomous data.
The predicted probabilities are supplied using the
PROBABILITIES parameter. For models for binary or binomial data, the predicted probabilities of success are supplied in a variate. For models for polytomous data, the predicted probabilities of membership to each group can be supplied in a matrix or a pointer to variates.
The actual outcome is defined using the
GROUPS parameter for binary and polytomous data, and the
NBINOMIAL parameters for binomial data. For models for binary data,
GROUPS must supply either a binary variate (i.e. a variate containing only zeros or ones) or a factor with two levels. If a binary variate is supplied, one corresponds to success in relation to
PROBABILITIES. Alternatively, if a factor is supplied the default is that the second level corresponds to success. You can set option
SUCCESSLEVEL=first to specify that the first level corresponds to success instead.
You can use the
SAVE option to supply a save structure, from a regression or an HGLM analysis, to provide the data if the
NBINOMIAL are not specified. The analyses must involve either a generalized linear model with a binomial distribution or an HGLM with a binomial distribution for the mean model. If neither those parameters nor
SAVE are specified, the data are taken from the most recent regression analysis.
For models for polytomous data,
GROUPS must supply a factor with the same number of levels as the columns in the matrix supplied by
PROBABILITIES. The first level of the
GROUPS factor then corresponds to the first column of the matrix, the second level to the second column, and so on (i.e. the predicted probabilities of membership to the group that correspond to the ith level of the factor are in the ith column of the matrix supplied by
For models for binomial data,
NSUCCESSES must supply a variate giving the number of successes, and
NBINOMIAL must supply either a scalar or a variate giving the number of trials. The
GROUPS parameter is then ignored.
The predicted probabilities can be plotted as rectangles, lines or in banded groups. This is specified using the
METHOD option with the following settings.
||the predicted probabilities, ordered from smallest to largest, are plotted as rectangles that are coloured according to whether or not the observation corresponds to a success (i.e. an actual occurrence of the event of interest); this is the default.|
||this is similar to
||a separate graph is drawn for each actual outcome (i.e. success/failure for dichotomous data or each group for polytomous data) with the predicted probabilities of that outcome ordered from smallest to largest, and plotted as rectangles. The rectangles are coloured using a graduated band of colours formed by grouping the predicted probabilities into distinct bands.|
||this is similar to
COLOURS option defines the colours that are used to plot the predicted probabilities. It must supply two colours, either in a variate (containing two numbers defining the colours using the RGB system) or in a text (containing the names of two of Genstat’s pre-defined colours; see
PEN for details). When
lines, the first colour corresponds to failures (i.e. non-occurrences of the event of interest) and the second to successes (i.e. occurrences of the event of interest); defaults are a shade of pink (RGB value = 12917629) and a shade of green (RBG value = 5083681). When
lbands, the two colours define the start and end colours values used by
DCOLOURS to form a linear band of graduated colours, with the first colour corresponding the lowest probability band, and the second to the highest probability band; defaults are a pale shade of yellow (RGB value = 16777011) and a dark shade of red (RBG value = 15073280). The number of discrete bands (and therefore colours) used to group the predicted probabilities into bands is specified using the
NGROUPS option. By default the predicted probabilities are grouped into 10 distinct bands; [0,0.1), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,0.9), [0.9,1]. (Note: the highest probability band is always a closed interval. All other probability bands are right half-open intervals.)
With large data sets, the lines on a separation plot may overlap. The
THICKNESS option can be used modify the thickness of lines plotted when
lband, by specifying a value by which the standard thickness is to be multiplied; default 1.
METHOD=lines, the default is to plot the failures (i.e. non-occurrences) before the successes (i.e. occurrences of the event of interest). The success lines may then overlap and obscure the failure lines. Alternatively, you can set option
LINEORDER=success to plot the successes lines first. The failures may then obscure the successes.
BACKGROUND specifies the background colour when
lightgray. Either a scalar (defining the colour using the RGB system) or a text (containing the name of a pre-defined colour; see
PEN for details) may be supplied.
By default, borders are not drawn around the rectangles when
rbands. However, you include borders by setting option
BORDER=yes. Their appearance can be modified by altering the settings of pen -7 (see
PEN for details).
lines, the individual predicted probabilities are plotted in order from smallest to largest. The
TIES option controls how tied probabilities are handled. The default,
TIES=permute, randomly permutes the order in which the tied values are plotted, thereby breaking up any pre-existing patterns that may distort the appearance of the separation plot. Alternatively,
TIES=same plots the tied values in the same order as they appear in
SEED option specifies the seed for the random-number generator, used by
RANDOMIZE, to make the permutations when
TIES=permute. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically. If you use the same (non-zero) seed more than once, the tied values will be permuted in the same way, and hence you will get same separation plot.
PLOT option controls what additional information is plotted on the graph, with the following settings.
||adds a key to the graph.|
||adds a line graph of the ordered predicted probabilities when
||adds a symbol (default star) denoting the expected number of successes when
By default, the
key is plotted. Also, when
traceline and the
expectednumber are plotted by default. You can suppress any additional information by setting
You can set option
USEPENS=yes to use the settings of pens 2 and 3 for the line drawn by
SHOW=traceline and for the symbol added by
SHOW=expectednumber, respectively. You can thus modify their appearance by modifying the settings of these pens prior to using
PEN for details.)
XTITLE parameters can supply an overall title and a x-axis title for the separation plot, respectively. If no overall title is supplied, a suitable title is generated automatically. To omit the title, a blank string can be supplied, i.e.
TITLE=' '. By default, the x-axis title is not displayed.
DSEPARATIONPLOT uses the methods described by Greenhill et al. (2011).
DSEPARATIONPLOT will work with restricted
NBINOMIAL variates and a restricted
GROUPS factor or variate. However, if more than one is restricted, they must be restricted in the same way. Note that the unrestricted length of all of the data variates and factors must be the same.
Greenhill, B., Ward, M.B. & Sacks, A. (2011). The separation plot: a visual method for evaluating the fit of binary models. American Journal of Political Science 55, 990-1002.
CAPTION 'DSEPARATION example'; STYLE=meta CAPTION 'Binary data','Goorin et al. (1987)',\ !T('(The Guide to the Genstat Command Language, Part 2: Statistics',\ 'Example 3.5.2)'); \ STYLE=major,plain,plain FACTOR Li,Sex,Aop READ Li,Sex,Aop,Free 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 1 1 2 1 1 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 1 2 1 2 1 2 1 2 2 1 0 2 2 2 0 1 2 1 1 2 1 2 1 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 1 2 0 2 2 2 1 2 2 2 0 1 2 1 1 2 2 1 1 2 2 2 1 2 2 2 0 1 2 2 1 2 2 1 1 2 2 2 1 2 2 2 0 2 1 1 1 2 2 1 1 2 2 2 1 2 1 1 1 2 2 1 1 2 2 2 0 : MODEL [DISTRIBUTION=binomial; LINK=logit] Free; NBINOMIAL=1 TERMS [FACT=9] Sex+Aop+Li FIT Sex+Aop+Li RKEEP FITTEDVALUES=fittedFree DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=lbands] PROBABILITIES=fittedFree; GROUPS=Free DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=fittedFree; GROUPS=Free CAPTION 'Binomial data','Finney (1971) analgesic drug data',\ !T('(A Guide to Regression, Nonlinear and Generalized Linear Models in',\ 'Genstat, Section 3.4)'); \ STYLE=major,plain,plain SPLOAD [PRINT=*] '%gendir%/data/Drug.gsh' CALCULATE LogDose = LOG(Dose) MODEL [DISTRIBUTION=binomial; LINK=probit; DISPERSION=1] R; NBINOMIAL=N TERMS [FACT=9] LogDose*Drug FIT [PRINT=*] LogDose*Drug RKEEP FITTEDVALUES=fittedR CALCULATE estprob = fittedR/N DSEPARATIONPLOT [METHOD=rect] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lines] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=lbands; THICKNESS=0.001] PROBABILITIES=estprob; \ NSUCCESSES=R; NBINOMIAL=N DSEPARATIONPLOT [METHOD=rbands] PROBABILITIES=estprob; NSUCCESSES=R; NBINOMIAL=N