FCA directive

Performs factor analysis.

Options

`PRINT` = string tokens	Printed output required (`communalities`, `loadings`, `coefficients`, `scores`, `residuals`, `cresiduals`, `vresiduals`, `tests`); default `*` i.e. no printing
`NDIMENSIONS` = scalar	Number of factors to fit; no default, must be specified
`METHOD` = string token	Whether to use correlations or variances and covariances (`correlation`, `vcovariance`, `variancecovariance`); default `vcov`
`MAXCYCLE` = scalar	Maximum number of iterations; default 50
`TOLERANCE` = scalar	Minimum value to assume for the unique component ψ_i² of each observed variable; default 10^-6

Parameters

`DATA` = pointers or matrices or symmetric matrices or SSPMs	Pointer of variates forming the data matrix, or matrix storing the variate values by columns, or symmetric matrix storing their variances and covariances, or SSPM giving their sums of squares and products
`NUNITS` = scalars	When `DATA` is set to a symmetric matrix of variances and covariances, `NUNITS` must specify the number of units from which they were calculated if tests are required
`LRV` = LRVs	Saves the loadings, latent roots and trace from each analysis
`SSPM` = SSPMs	Saves the SSPM formed from a `DATA` matrix or pointer
`COMMUNALITIES` = variates	Saves the communalities
`COEFFICIENTS` = matrices	Saves the factor score coefficients
`SCORES` = matrices or pointers	Saves the factor analysis scores
`RESIDUALS` = matrices or pointers	Saves residuals from the dimensions fitted in the analysis
`CRESIDUALS` = symmetric matrices	Saves the residual correlation or covariance matrix
`VRESIDUALS` = variates	Saves the residual variances

Description

Factor analysis aims to find a set of “latent” (or unobservable) variables {z₁…z_k} that account for the variances and covariances S between a set of p observed variables {x₁…x_p}. In the terminology of factor analysis, the latent variables {z_i} are known as factors. However, they are continuous variables, and thus are represented in Genstat by variate rather than by factor data structures. So to avoid confusion, when we refer to the latent variables below, factor will be printed in italic font.

The data for a factor analysis consists of observed measurements on the variables {x_i} made on a set of subjects. The assumption is that, for each subject, the values of the observed variables are related to the factors by a linear model

x = μ + Γ z + ε

where x is the vector of observed variables,

z is the vector of factors,

μ is a vector of means for the observed variables,

Γ is a matrix of loadings defining the relationship between observed and latent variables, and

ε is a vector of residuals.

The elements of the residual vector ε are assumed to have mean zero and to be uncorrelated, i.e. the dispersion matrix of ε is assumed to be diagonal

cov(ε) = Ψ = diag(ψ₁², … ψ_p²)

(They thus differ from the residuals formed in a principal components analysis, which will be correlated; see e.g. Krzanowski 1988 Section 16.2 for more details). The factors themselves are assumed to have variance one and to be uncorrelated, i.e.

cov(z) = I.

So the correlations between the observed variables {x_i} arise only through their relations with the factors, and not because of any correlation between the residuals or between the factors.

The DATA parameter specifies the data for the factor analysis. You can supply either a pointer containing a set of variates, one for each observed variable {x_i}, or a matrix storing the observed variables by columns, or a symmetric matrix containing variances and covariances between the variables, or an SSPM structure (formed using FSSPM from the variates of observed measurements). When DATA specifies a symmetric matrix of variances and covariances, you must also set the NUNITS parameter to specify the number of units from which they were calculated if you want FCA to print tests.

The METHOD option has settings vcovariance (with synonym variancecovariance) and correlation, to control whether FCA forms a matrix of variances and covariances or a matrix of correlations for the analysis. The same factors will be obtained if you use a correlation matrix, but the loadings will be scaled to be between zero and one. The number of factors, q, to fit must be specified by the NDIMENSIONS option. Arising from the numbers of parameters in the model (see Krzanowski 1988 Section 16.2.2) this is subject to the constraint

(p – q)² ≥ p + q.

The PRINT option controls printed output, with settings:

`communalities`	the proportion of variation explained by the factors for each observed variable, (var(x_i) – ψ_i²) / var(x_i);
`loadings`	the matrix of factor loadings Γ;
`coefficients`	the factor score coefficients;
`scores`	the factor scores calculated from the model for each subject;
`residuals`	the vectors of residuals ε,
`cresiduals`	the residual correlation or covariance matrix i.e. a symmetric matrix showing the amount of unexplained correlation or covariance between each pair of variables;
`vresiduals`	the residual variances; and
`tests`	a chi-square goodness of fit test for the model.

By default nothing is printed. Note, however, that scores and residuals cannot be produced when DATA is set to a symmetric matrix of variances and covariances.

The communalities, factor coefficients, scores, residuals, residual correlations or covariances and residual variances can also be saved using the COMMUNALITIES, COEFFICIENTS, SCORES, RESIDUALS, CRESIDUALS and VRESIDUALS parameters, respectively. The LRV parameter allows an LRV structure to be saved, with the loadings in the ['vectors'] component, and the eigenvalues of the matrix Ψ^-½ S Ψ^-½ in the ['roots'] component; the loadings are scaled eigenvectors of Ψ^-½ S Ψ^-½. (Remember, S is the matrix of variances and covariances of the observed variables {x_i}.) The SSPM parameter can save the SSPM structure constructed from a DATA pointer for the analysis. A particularly convenient instance is when you have supplied an SSPM structure as input but, for example, have set METHOD=correlation: the SSPM that is saved will then contain correlations instead of sums of squares and products.

Options: PRINT, NDIMENSIONS, METHOD, MAXCYCLE, TOLERANCE.

Parameters: DATA, NUNITS, LRV, SSPM, COMMUNALITIES, COEFFICIENTS, SCORES, RESIDUALS, CRESIDUALS, VRESIDUALS.

Method

FCA estimates the parameters of the model by maximum likelihood, assuming multivariate Normality, using subroutines G03CAF and G03CCF from the NAG Library. The MAXCYCLE option sets a limit on the number of iterations (default 50). The TOLERANCE option specifies the minimum value to assume for the unique component ψ_i² of each observed variable so that the communality is always less than one; the default is 10^-6.

Action with `RESTRICT`

If any of the variates in a DATA pointer is restricted, only the defined subset of the units will be used in the analysis.

References

Krzanowski, W.J. (1988). Principles of Multivariate Analysis: a User’s Perspective. Oxford University Press, Oxford.

Example

" Example 2:6.11 "
TEXT [VALUES=Gaelic,English,History,Arithmetic,Algebra,Geometry] Subjects
SYMMETRICMATRIX [ROWS=Subjects; VALUES=\
1.000,\
0.439, 1.000,\
0.410, 0.351, 1.000,\
0.288, 0.354, 0.164, 1.000,\
0.329, 0.320, 0.190, 0.595, 1.000,\
0.248, 0.329, 0.181, 0.470, 0.464, 1.000] Correlation
FCA [PRINT=communalities,loadings,cresiduals,tests; NDIMENSION=2]\
    Correlation; NUNITS=220

Updated on March 8, 2019

Was this article helpful?

Yes No