1. Home
  2. Principal Components Clustering

Principal Components Clustering

Principal components clustering uses the PCPCLUSTER procedure to perform cluster analysis for a large data set. This finds orthogonal linear combinations of a set of variates that maximize the variation contained within them, and then clusters the individuals on the first few of these dimensions. This is to allow very large numbers of individuals to be clustered, as a n x n similarity matrix is not required. The first step reduces the number of attributes of the units by taking the first 2-6 scores from a PCP analysis.

The second step divides the multi-dimensional space defined by the scores into cells, and forms a density table by tabulating the number of units in each cell. The clusters are formed by finding contiguous collections of cells in which the density (or number of units) exceeds thresholds specified by the variate or numerical list in Minimum number of units in cells field. The units in these clusters of cells will be connected to each other in a similar way to the units in a hierarchical cluster analysis.

  1. After you have imported your data, from the menu select 
    Stats | Multivariate Analysis | Cluster Analysis | Principal Components.
  2. Fill in the fields as required then click Run.

Data to be analyzed

Used to enter the names of the variates to be used to cluster the individuals. You can transfer multiple selections from Available data by holding the Ctrl key on your keyboard while selecting items, then click  to move them all across in one action.

Analysis based on

Selects whether the principal components analysis is based on the sums of squares and products, correlation, or variance-covariance matrix.

Number of dimensions to use

This specifies the number of principal components (n) to extract and use for clustering. This must be at least 2 and at most 6.

Number of partitions

This specifies the number of partitions (p) in each dimension to use in forming the cells to be clustered. This must be at least 2 and at most 500. The total number of cells is the pn.

Minimum number of units in cells

This specifies a variate, scalar or comma or space separated list of numbers that gives a minimum number of units in a cell. A cluster solution will be produced for each number so that you can select the number which works best for you. If this is left blank, a default range of number of units will be used, being the maximum density multiplied by 0.8, 0.75 … 0.2.

Action Icons

Pin Controls whether to keep the dialog open when you click Run. When the pin is down  the dialog will remain open, otherwise when the pin is up  the dialog will close.
Restore Restore names into edit fields and default settings.
Clear Clear all fields and list boxes.
Help Open the Help topic for this dialog.

See also

Updated on May 23, 2023

Was this article helpful?