1. Home
  2. FCA directive

FCA directive

Performs factor analysis.

Options

PRINT = string tokens Printed output required (communalities, loadings, coefficients, scores, residuals, cresiduals, vresiduals, tests); default * i.e. no printing
NDIMENSIONS = scalar Number of factors to fit; no default, must be specified
METHOD = string token Whether to use correlations or variances and covariances (correlation, vcovariance, variancecovariance); default vcov
MAXCYCLE = scalar Maximum number of iterations; default 50
TOLERANCE = scalar Minimum value to assume for the unique component ψi2 of each observed variable; default 10-6

Parameters

DATA = pointers or matrices or symmetric matrices or SSPMs Pointer of variates forming the data matrix, or matrix storing the variate values by columns, or symmetric matrix storing their variances and covariances, or SSPM giving their sums of squares and products
NUNITS = scalars When DATA is set to a symmetric matrix of variances and covariances, NUNITS must specify the number of units from which they were calculated if tests are required
LRV = LRVs Saves the loadings, latent roots and trace from each analysis
SSPM = SSPMs Saves the SSPM formed from a DATA matrix or pointer
COMMUNALITIES = variates Saves the communalities
COEFFICIENTS = matrices Saves the factor score coefficients
SCORES = matrices or pointers Saves the factor analysis scores
RESIDUALS = matrices or pointers Saves residuals from the dimensions fitted in the analysis
CRESIDUALS = symmetric matrices Saves the residual correlation or covariance matrix
VRESIDUALS = variates Saves the residual variances

Description

Factor analysis aims to find a set of “latent” (or unobservable) variables {z1zk} that account for the variances and covariances S between a set of p observed variables {x1xp}. In the terminology of factor analysis, the latent variables {zi} are known as factors. However, they are continuous variables, and thus are represented in Genstat by variate rather than by factor data structures. So to avoid confusion, when we refer to the latent variables below, factor will be printed in italic font.

The data for a factor analysis consists of observed measurements on the variables {xi} made on a set of subjects. The assumption is that, for each subject, the values of the observed variables are related to the factors by a linear model

x = μ + Γ z + ε

where x is the vector of observed variables,

z    is the vector of factors,

μ    is a vector of means for the observed variables,

Γ    is a matrix of loadings defining the relationship between observed and latent variables, and

ε    is a vector of residuals.

The elements of the residual vector ε are assumed to have mean zero and to be uncorrelated, i.e. the dispersion matrix of ε is assumed to be diagonal

cov(ε) = Ψ = diag(ψ12, … ψp2)

(They thus differ from the residuals formed in a principal components analysis, which will be correlated; see e.g. Krzanowski 1988 Section 16.2 for more details). The factors themselves are assumed to have variance one and to be uncorrelated, i.e.

cov(z) = I.

So the correlations between the observed variables {xi} arise only through their relations with the factors, and not because of any correlation between the residuals or between the factors.

The DATA parameter specifies the data for the factor analysis. You can supply either a pointer containing a set of variates, one for each observed variable {xi}, or a matrix storing the observed variables by columns, or a symmetric matrix containing variances and covariances between the variables, or an SSPM structure (formed using FSSPM from the variates of observed measurements). When DATA specifies a symmetric matrix of variances and covariances, you must also set the NUNITS parameter to specify the number of units from which they were calculated if you want FCA to print tests.

The METHOD option has settings vcovariance (with synonym variancecovariance) and correlation, to control whether FCA forms a matrix of variances and covariances or a matrix of correlations for the analysis. The same factors will be obtained if you use a correlation matrix, but the loadings will be scaled to be between zero and one. The number of factors, q, to fit must be specified by the NDIMENSIONS option. Arising from the numbers of parameters in the model (see Krzanowski 1988 Section 16.2.2) this is subject to the constraint

(pq)2p + q.

The PRINT option controls printed output, with settings:

    communalities the proportion of variation explained by the factors for each observed variable, (var(xi) – ψi2) / var(xi);
    loadings the matrix of factor loadings Γ;
    coefficients the factor score coefficients;
    scores the factor scores calculated from the model for each subject;
    residuals the vectors of residuals ε,
    cresiduals the residual correlation or covariance matrix i.e. a symmetric matrix showing the amount of unexplained correlation or covariance between each pair of variables;
    vresiduals the residual variances; and
    tests a chi-square goodness of fit test for the model.

By default nothing is printed. Note, however, that scores and residuals cannot be produced when DATA is set to a symmetric matrix of variances and covariances.

The communalities, factor coefficients, scores, residuals, residual correlations or covariances and residual variances can also be saved using the COMMUNALITIES, COEFFICIENTS, SCORES, RESIDUALS, CRESIDUALS and VRESIDUALS parameters, respectively. The LRV parameter allows an LRV structure to be saved, with the loadings in the ['vectors'] component, and the eigenvalues of the matrix Ψ S Ψ in the ['roots'] component; the loadings are scaled eigenvectors of Ψ S Ψ. (Remember, S is the matrix of variances and covariances of the observed variables {xi}.) The SSPM parameter can save the SSPM structure constructed from a DATA pointer for the analysis. A particularly convenient instance is when you have supplied an SSPM structure as input but, for example, have set METHOD=correlation: the SSPM that is saved will then contain correlations instead of sums of squares and products.

Options: PRINT, NDIMENSIONS, METHOD, MAXCYCLE, TOLERANCE.

Parameters: DATA, NUNITS, LRV, SSPM, COMMUNALITIES, COEFFICIENTS, SCORES, RESIDUALS, CRESIDUALS, VRESIDUALS.

Method

FCA estimates the parameters of the model by maximum likelihood, assuming multivariate Normality, using subroutines G03CAF and G03CCF from the NAG Library. The MAXCYCLE option sets a limit on the number of iterations (default 50). The TOLERANCE option specifies the minimum value to assume for the unique component ψi2 of each observed variable so that the communality is always less than one; the default is 10-6.

Action with RESTRICT

If any of the variates in a DATA pointer is restricted, only the defined subset of the units will be used in the analysis.

References

Krzanowski, W.J. (1988). Principles of Multivariate Analysis: a User’s Perspective. Oxford University Press, Oxford.

See also

Directives: CVA, MDS, PCO, PCP, ROTATE, SSPM.

Procedures: LRVSCREE, DMST, PLS, RIDGE.

Commands for: Multivariate and cluster analysis.

Example

" Example 2:6.11 "
TEXT [VALUES=Gaelic,English,History,Arithmetic,Algebra,Geometry] Subjects
SYMMETRICMATRIX [ROWS=Subjects; VALUES=\
1.000,\
0.439, 1.000,\
0.410, 0.351, 1.000,\
0.288, 0.354, 0.164, 1.000,\
0.329, 0.320, 0.190, 0.595, 1.000,\
0.248, 0.329, 0.181, 0.470, 0.464, 1.000] Correlation
FCA [PRINT=communalities,loadings,cresiduals,tests; NDIMENSION=2]\
    Correlation; NUNITS=220
Updated on March 8, 2019

Was this article helpful?