Produces density plots for large data sets (D. B. Baird).
Options
PLOT = string tokens |
How to plot the density (pointplot , shadeplot , contourplot , histogram , surface ); default poin |
---|---|
NGROUPS = scalar |
Number of sections into which to divide each axis (4-400); default 50 |
METHOD = string token |
Method to use to smooth the density (thinplate , radialspline , tensorspline , kernel ); default * i.e. none |
DF = scalar |
Degrees of freedom for smoothing methods (2-50); default 12 |
BANDWIDTH = scalar |
Bandwidth for kernel smoothing (0-1); default 0.2 |
MEANFIT = string tokens |
What smooth regression fits to the means to plot (yx , xy ); default * i.e. none |
NCONTOURS = scalar |
Number of contours in the contour plot; default 9 |
SYMBOL = string token |
Symbol to use in a point plot (circle , square ); default circ |
COLOURS = text, variate or scalar |
Colour to use to draw the symbols, shades, contours or surface; default !t(red, blue, black) |
XTRANSFORM = string token |
Transformed scale for the x-axis (identity , log , log10 , logit , probit , cloglog , square , exp , exp10 , ilogit , iprobit , icloglog , root ); default iden |
YTRANSFORM = string token |
Transformed scale for the y-axis (identity , log , log10 , logit , probit , cloglog , square , exp , exp10 , ilogit , iprobit , icloglog , root ); default iden |
ZTRANSFORM = string token |
Transformed scale for the z-axis (identity , percentile , root ); default iden |
WINDOW = scalar |
Window number for the graphs; default 3 |
SCREEN = string token |
Whether to clear the screen before plotting or to continue plotting on the old screen (clear , keep , resize ); default clea |
Parameters
Y = variate or factor |
Y-coordinates of the data |
---|---|
X = variate or factor |
X-coordinates of the data |
TITLE = text |
Title for graph; default uses the names of the data and type of plot |
Description
Procedure DXYDENSITY
produces a density plot of two variables, using high-resolution graphics. A density plot provides a better visual representation of the 2-dimensional spread of points than a scatter plot if there are a large number of points or many points overlap each other, and is quicker to plot. A density plot displays the number of points in small regions of the x-y plane, using various methods to plot the density.
The x and y axes are divided into equally spaced sections, to give a grid of rectangular cells covering the x-y plane. The density is calculated as the number of points that falls into each cell. The number of sections is specified by the NGROUPS
option, as a scalar if the same number is required in each direction, or as a variate with two values to specify different numbers for the y-axis (first value) and the x-axis (second value). Having a large number of cells preserves more detail, but increases the time required to create and plot the graph.
The x- or y-axes can be transformed before forming the sections and calculating the density, by using the XTRANSFORM
or YTRANSFORM
options. The settings are the same as those of the TRANSFORM
option of the XAXIS
and YAXIS
directives.
The PLOT
option controls how the density is plotted, with settings:
pointplot |
point plot , using the symbol size to indicate the number of points in each cell; |
---|---|
shadeplot |
shade plot, using intensity of colour to indicate the number of points in each cell; |
contourplot |
contour plot, with contours showing the density; |
surface |
surface plot, with density as height; |
histogram |
3-dimensional histogram of the density. |
By default PLOT=pointplot
.
The density can be smoothed by using the METHOD
option, with settings:
thinplate |
a 2-dimensional thin plate spline is fitted to the counts using the THINPLATE procedure; |
---|---|
radialspline |
a 2-dimensional radial spline is fitted to the counts using the RADIALSPLINE procedure; |
tensorspline |
a 2-dimensional tensor spline is fitted to the counts using the TENSORSPLINE procedure; |
kernel |
a 2-dimensional kernel smoother is fitted to the counts. |
By default no smoothing is done.
The DF
option specifies the number of degrees of freedom for the splines (default 12); smaller values make the surface smoother, and larger values allow it to be rougher. The BANDWIDTH
option specifies the band width for kernel smoothing; larger values make the surface smoother, and smaller values allow it to be rougher.
The shape of each point in a point plot is specified by the SYMBOL
option, as either a circle (default) or square. The COLOURS
option specifies the colours that are used, in a scalar or a text or variate with up to three values. For a line plot, the first value specifies the colour for the points, and the second and third values define the colours for any lines fitted by the MEANFIT
option. For a histogram, the first value of COLOURS
defines the colour of the bars. For shade, contour and surface plot, if COLOURS
has two or more values, the first is used for high densities, the second is used for low densities, and intermediate densities are plotted in the corresponding intermediate colour; if COLOURS
has only one value, the low densities are plotted in white. If COLOURS
has three values, the third is used for the contours of contour and surface plots.
The scaling of densities is controlled by the ZTRANSFORM
option with settings:
identity |
no scaling (default), |
---|---|
root |
takes the square root of the densities, giving more emphasis to low counts, |
percentile |
takes a rank transform and plots these, so that percentiles are equally spaced. |
The MEANFIT
option allows you can to add a smoothing spline regression of y on x or of x on y to a point plot. The available settings are
yx |
for a regression of y on x, and |
---|---|
xy |
for a regression of x on y. |
The DF
option again specifies the number of degrees of freedom for the spline (default 12). By default neither are done.
The Y
and X
parameters specify the y- and x-coordinates of the data values, in either variates or factors. Their identifiers are used for the titles of the axes at the lower and left-hand edges of the graphics frame (i.e. page). You can also use the TITLE
parameter to supply an overall title for the plot.
The WINDOW
options specifies the number of the window to use for the plot, and the SCREEN
option controls whether the screen is cleared first, as usual (see e.g. DGRAPH
).
Options: PLOT
, NGROUPS
, XTRANSFORM
, YTRANSFORM
, ZTRANSFORM
, METHOD
, MEANFIT
, DF
, BANDWIDTH
, NCONTOURS
, COLOURS
, SYMBOL
, WINDOW
, SCREEN
.
Parameters: Y
, X
, TITLE
.
Action with RESTRICT
If any of the variates or factors are restricted, only the units not excluded by the restriction will be plotted.
See also
Directive: DCONTOUR
, DGRAPH
, DSHADE
, D3GRAPH
.
Commands for: Graphics.
Example
CAPTION 'DXYDENSITY example','Density plots of microarray data';\ STYLE=meta,plain ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' IF check SPLOAD [PRINT=*] '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'; ISAVE=data RESTRICT data[]; Intensity > 1 ELSE CAPTION 'Microarray data not installed, using artifical data' SCALAR N; VALUE=60000 CALC [SEED=531] Intensity = (X = 10*!(1...N)/N) + GRNORMAL(N;2;0.1) CALC logRatio = BOUND(LOG(1+X-(X/8)**2) + GRNORMAL(N;-1.6;0.2/X**1.5);-3;3) ENDIF DXYDENSITY [PLOT=POINT; MEANFIT=xy,yx] Y=logRatio; X=Intensity DXYDENSITY [PLOT=SHADE; NGROUPS=60; ZTRANSFORM=percentile;\ COLOUR=!t(red,blue)] Y=logRatio; X=Intensity DXYDENSITY [PLOT=histogram; ZTRANSFORM=root; NGROUPS=30; COLOUR='lightblue']\ Y=logRatio; X=Intensity DXYDENSITY [PLOT=surface; METHOD=kernel; BANDWIDTH=0.1]\ Y=logRatio; X=Intensity