Principal components clustering uses the PCPCLUSTER procedure to perform cluster analysis for a large data set. This finds orthogonal linear combinations of a set of variates that maximize the variation contained within them, and then clusters the individuals on the first few of these dimensions. This is to allow very large numbers of individuals to be clustered, as a n x n similarity matrix is not required. The first step reduces the number of attributes of the units by taking the first 2-6 scores from a PCP analysis.
The second step divides the multi-dimensional space defined by the scores into cells, and forms a density table by tabulating the number of units in each cell. The clusters are formed by finding contiguous collections of cells in which the density (or number of units) exceeds thresholds specified by the variate or numerical list in Minimum number of units in cells field. The units in these clusters of cells will be connected to each other in a similar way to the units in a hierarchical cluster analysis.
- After you have imported your data, from the menu select
Stats | Multivariate Analysis | Cluster Analysis | Principal Components. - Fill in the fields as required then click Run.
Data to be analyzed
Used to enter the names of the variates to be used to cluster the individuals. You can transfer multiple selections from Available data by holding the Ctrl key on your keyboard while selecting items, then click to move them all across in one action.
Analysis based on
Selects whether the principal components analysis is based on the sums of squares and products, correlation, or variance-covariance matrix.
Number of dimensions to use
This specifies the number of principal components (n) to extract and use for clustering. This must be at least 2 and at most 6.
Number of partitions
This specifies the number of partitions (p) in each dimension to use in forming the cells to be clustered. This must be at least 2 and at most 500. The total number of cells is the p^{n}.
Minimum number of units in cells
This specifies a variate, scalar or comma or space separated list of numbers that gives a minimum number of units in a cell. A cluster solution will be produced for each number so that you can select the number which works best for you. If this is left blank, a default range of number of units will be used, being the maximum density multiplied by 0.8, 0.75 … 0.2.
Action Icons
Pin | Controls whether to keep the dialog open when you click Run. When the pin is down the dialog will remain open, otherwise when the pin is up the dialog will close. | |
Restore | Restore names into edit fields and default settings. | |
Clear | Clear all fields and list boxes. | |
Help | Open the Help topic for this dialog. |
See also
- Options for choosing which results to display
- Saving Results for further analysis
- PCPCLUSTER procedure