1. Home
  2. DPARALLEL procedure

DPARALLEL procedure

Displays multivariate data using parallel coordinates (Z. Karaman).

Options

TITLE = text Title for the plot
GROUPS = factor Defines grouping of the units (if any); by default, different pens are used for the observations in different groups
PERMUTATIONSALL = string token Whether to display all necessary permutations so that any two variates will be adjacent in at least one plot, or just display once in the order given by the DATA pointer (yes, no); default no
SCALING = string token Whether to do scaling overall (scale all variates on the same scale), or to scale each variate separately (overall, separate); default sepa
PEN = variate Pens to be used for different groups (if any); default * uses pens from 1 up to the number of groups (number of levels of the GROUPS factor)

Parameter

DATA = variates Data variables to be plotted

Description

The scatter plot is probably the most powerful and most frequently used statistical tool for analysing the relationship between two variables. It is very intuitive way to look at the data since it corresponds to our perception of the world. The major drawback is that it does not generalize naturally to higher dimensions. Using interactive graphics devices like high-resolution screens one can rotate a point cloud in three dimensions (commonly called spinning), and further dimension can be partially encoded by using different colours, symbols, or symbol sizes; however, this technique can be used only on interactive graphics devices, and it is difficult to see relationships between all the variables at a time. Another possibility is the matrix of scatter plots (provided by procedure DSCATTER), but this has the drawback that it is difficult to follow one data point across several plots.

An alternative is to display multivariate data using parallel coordinates. The dimensions are not represented by orthogonal lines as is customary done when plotting scatter diagrams (which limits the dimensionality to two, or at most three if spinning is used). Rather, they are represented by a series of parallel lines (either horizontal or vertical), and a point in a multidimensional space is represented by a broken line connecting its coordinates in each dimension. The only limit on the number of dimensions that can be displayed simultaneously by such plot is its readability, which is a function of the underlying graphics display (hardware). The parallel coordinates geometry was developed by Inselberg (1985) in the context of computational geometry; it was applied to statistical multidimensional analysis by Wegman (1990). Inselberg also gives some interesting duality properties between classical Euclidean plane and parallel coordinates geometry.

The relationship between two variables can be visually assessed by inspecting a parallel coordinates plot. When the correlation between two variables is close to -1, the lines are crossing over and so, in the limit, we would have a pencil of lines. (A pencil of lines is a set of lines that are coincident at a single point.) On the other hand, when the correlation approaches +1, we will have fewer and fewer crossovers, so that in the limit we would have a set of parallel lines. The pairwise comparisons are easy for variables represented by adjacent axes; however, they are much more difficult for the axes far away on the graph. For n variables, there are n! possible permutations, but many of these duplicate adjacencies. Wegman (1990) has shown that with a relatively small number of permutations of the axes (approximately n/2) one can achieve that in some permutation every variable is adjacent to every other variable. Multivariate outliers can be identified easily on this plot, since it is very intuitive to follow with one’s eye the line across the axes. If the PERMUTATIONSALL option is set to yes, several plots will be produced so that every pair of variables is adjacent in at least one plot.

In our implementation we have chosen to dispose the axes vertically, since this way the readability is maximized for most output devices (either terminal screens or printers when printing in landscape mode). The variables can be independently scaled on a 0 to 1 scale, or left in original units if the values are of the same order of magnitude. In the first case it is easier to have an visual estimate of the correlation between the two adjacent variables; on the other hand, leaving the data in original units gives us a good idea of the location and spread parameters of the marginal distributions.

The data are specified, in a list of variates, using the DATA parameter. The GROUPS option can be used to specify a grouping factor. The lines for observations in each group are then plotted using different pens, thus giving an immediate insight to any patterns in data. By default, pens 1 upwards are used for the different groups, but the PEN option can be used to specify other pens, in a variate with as many values as groups. If the GROUPS option is not set, the PEN option can be set to a scalar, to select the pen to be used for all the points. The TITLE option can be used to supply a title for the plots.

Options: TITLE, GROUPS, PERMUTATIONSALL, SCALING, PEN.

Parameter: DATA.

Method

DPARALLEL uses the standard Genstat directives for data manipulation and graphics. The underlying methodology is described by Inselberg (1985) and Wegman (1990). It calls subsidiary procedure _DPARWEGMAN to generate the permutations matrix; each column of the output matrix gives one of the permutations described by Wegman (1990).

Action with RESTRICT

Restrictions are not allowed. Missing values are allowed within the input variates in DATA; the observations with missing data are not excluded form the plot, but will have the parts of their broken lines adjacent to the missing value missing from the plot.

References

Inselberg, A. (1985). The plane with parallel coordinates. The Visual Computer, 1, 69-91.

Wegman, E. (1990). Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 85, 664-675.

See also

Procedure: DSCATTER.

Commands for: Multivariate and cluster analysis, Graphics.

Example

CAPTION   'DPARALLEL example',\
          'Iris data set from Fisher (1936) and Anderson (1935).';\
          STYLE=meta,plain
VARIATE   [NVALUES=150] Sepal_L,Sepal_W,Petal_L,Petal_W
POINTER   [VALUES=Sepal_L,Sepal_W,Petal_L,Petal_W] Measures
READ      Measures[]
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2
 4.6  3.1  1.5  0.2
 5.0  3.6  1.4  0.2
 5.4  3.9  1.7  0.4
 4.6  3.4  1.4  0.3
 5.0  3.4  1.5  0.2
 4.4  2.9  1.4  0.2
 4.9  3.1  1.5  0.1
 5.4  3.7  1.5  0.2
 4.8  3.4  1.6  0.2
 4.8  3.0  1.4  0.1
 4.3  3.0  1.1  0.1
 5.8  4.0  1.2  0.2
 5.7  4.4  1.5  0.4
 5.4  3.9  1.3  0.4
 5.1  3.5  1.4  0.3
 5.7  3.8  1.7  0.3
 5.1  3.8  1.5  0.3
 5.4  3.4  1.7  0.2
 5.1  3.7  1.5  0.4
 4.6  3.6  1.0  0.2
 5.1  3.3  1.7  0.5
 4.8  3.4  1.9  0.2
 5.0  3.0  1.6  0.2
 5.0  3.4  1.6  0.4
 5.2  3.5  1.5  0.2
 5.2  3.4  1.4  0.2
 4.7  3.2  1.6  0.2
 4.8  3.1  1.6  0.2
 5.4  3.4  1.5  0.4
 5.2  4.1  1.5  0.1
 5.5  4.2  1.4  0.2
 4.9  3.1  1.5  0.2
 5.0  3.2  1.2  0.2
 5.5  3.5  1.3  0.2
 4.9  3.6  1.4  0.1
 4.4  3.0  1.3  0.2
 5.1  3.4  1.5  0.2
 5.0  3.5  1.3  0.3
 4.5  2.3  1.3  0.3
 4.4  3.2  1.3  0.2
 5.0  3.5  1.6  0.6
 5.1  3.8  1.9  0.4
 4.8  3.0  1.4  0.3
 5.1  3.8  1.6  0.2
 4.6  3.2  1.4  0.2
 5.3  3.7  1.5  0.2
 5.0  3.3  1.4  0.2
 7.0  3.2  4.7  1.4
 6.4  3.2  4.5  1.5
 6.9  3.1  4.9  1.5
 5.5  2.3  4.0  1.3
 6.5  2.8  4.6  1.5
 5.7  2.8  4.5  1.3
 6.3  3.3  4.7  1.6
 4.9  2.4  3.3  1.0
 6.6  2.9  4.6  1.3
 5.2  2.7  3.9  1.4
 5.0  2.0  3.5  1.0
 5.9  3.0  4.2  1.5
 6.0  2.2  4.0  1.0
 6.1  2.9  4.7  1.4
 5.6  2.9  3.6  1.3
 6.7  3.1  4.4  1.4
 5.6  3.0  4.5  1.5
 5.8  2.7  4.1  1.0
 6.2  2.2  4.5  1.5
 5.6  2.5  3.9  1.1
 5.9  3.2  4.8  1.8
 6.1  2.8  4.0  1.3
 6.3  2.5  4.9  1.5
 6.1  2.8  4.7  1.2
 6.4  2.9  4.3  1.3
 6.6  3.0  4.4  1.4
 6.8  2.8  4.8  1.4
 6.7  3.0  5.0  1.7
 6.0  2.9  4.5  1.5
 5.7  2.6  3.5  1.0
 5.5  2.4  3.8  1.1
 5.5  2.4  3.7  1.0
 5.8  2.7  3.9  1.2
 6.0  2.7  5.1  1.6
 5.4  3.0  4.5  1.5
 6.0  3.4  4.5  1.6
 6.7  3.1  4.7  1.5
 6.3  2.3  4.4  1.3
 5.6  3.0  4.1  1.3
 5.5  2.5  4.0  1.3
 5.5  2.6  4.4  1.2
 6.1  3.0  4.6  1.4
 5.8  2.6  4.0  1.2
 5.0  2.3  3.3  1.0
 5.6  2.7  4.2  1.3
 5.7  3.0  4.2  1.2
 5.7  2.9  4.2  1.3
 6.2  2.9  4.3  1.3
 5.1  2.5  3.0  1.1
 5.7  2.8  4.1  1.3
 6.3  3.3  6.0  2.5
 5.8  2.7  5.1  1.9
 7.1  3.0  5.9  2.1
 6.3  2.9  5.6  1.8
 6.5  3.0  5.8  2.2
 7.6  3.0  6.6  2.1
 4.9  2.5  4.5  1.7
 7.3  2.9  6.3  1.8
 6.7  2.5  5.8  1.8
 7.2  3.6  6.1  2.5
 6.5  3.2  5.1  2.0
 6.4  2.7  5.3  1.9
 6.8  3.0  5.5  2.1
 5.7  2.5  5.0  2.0
 5.8  2.8  5.1  2.4
 6.4  3.2  5.3  2.3
 6.5  3.0  5.5  1.8
 7.7  3.8  6.7  2.2
 7.7  2.6  6.9  2.3
 6.0  2.2  5.0  1.5
 6.9  3.2  5.7  2.3
 5.6  2.8  4.9  2.0
 7.7  2.8  6.7  2.0
 6.3  2.7  4.9  1.8
 6.7  3.3  5.7  2.1
 7.2  3.2  6.0  1.8
 6.2  2.8  4.8  1.8
 6.1  3.0  4.9  1.8
 6.4  2.8  5.6  2.1
 7.2  3.0  5.8  1.6
 7.4  2.8  6.1  1.9
 7.9  3.8  6.4  2.0
 6.4  2.8  5.6  2.2
 6.3  2.8  5.1  1.5
 6.1  2.6  5.6  1.4
 7.7  3.0  6.1  2.3
 6.3  3.4  5.6  2.4
 6.4  3.1  5.5  1.8
 6.0  3.0  4.8  1.8
 6.9  3.1  5.4  2.1
 6.7  3.1  5.6  2.4
 6.9  3.1  5.1  2.3
 5.8  2.7  5.1  1.9
 6.8  3.2  5.9  2.3
 6.7  3.3  5.7  2.5
 6.7  3.0  5.2  2.3
 6.3  2.5  5.0  1.9
 6.5  3.0  5.2  2.0
 6.2  3.4  5.4  2.3
 5.9  3.0  5.1  1.8  :
FACTOR    [NVALUES=150; LABELS=!t(Setosa,Versicolor,Virginica);\ 
          VALUES=50(1,2,3)] Species
DPARALLEL [TITLE=!t('Fisher''s Iris Data'); GROUPS=Species] Measures[]\
DPARALLEL [TITLE=!t('Fisher''s Iris Data'); SCALING=overall; GROUPS=Species]\ 
          Measures[]
DPARALLEL [TITLE=!t('Fisher''s Iris Data'); PERMUTATIONS=yes; GROUPS=Species]\
          Measures[]
Updated on March 8, 2019

Was this article helpful?