Performs factor analysis.
Options
PRINT = string tokens |
Printed output required (communalities , loadings , coefficients , scores , residuals , cresiduals , vresiduals , tests ); default * i.e. no printing |
---|---|
NDIMENSIONS = scalar |
Number of factors to fit; no default, must be specified |
METHOD = string token |
Whether to use correlations or variances and covariances (correlation , vcovariance , variancecovariance ); default vcov |
MAXCYCLE = scalar |
Maximum number of iterations; default 50 |
TOLERANCE = scalar |
Minimum value to assume for the unique component ψi2 of each observed variable; default 10-6 |
Parameters
DATA = pointers or matrices or symmetric matrices or SSPMs |
Pointer of variates forming the data matrix, or matrix storing the variate values by columns, or symmetric matrix storing their variances and covariances, or SSPM giving their sums of squares and products |
---|---|
NUNITS = scalars |
When DATA is set to a symmetric matrix of variances and covariances, NUNITS must specify the number of units from which they were calculated if tests are required |
LRV = LRVs |
Saves the loadings, latent roots and trace from each analysis |
SSPM = SSPMs |
Saves the SSPM formed from a DATA matrix or pointer |
COMMUNALITIES = variates |
Saves the communalities |
COEFFICIENTS = matrices |
Saves the factor score coefficients |
SCORES = matrices or pointers |
Saves the factor analysis scores |
RESIDUALS = matrices or pointers |
Saves residuals from the dimensions fitted in the analysis |
CRESIDUALS = symmetric matrices |
Saves the residual correlation or covariance matrix |
VRESIDUALS = variates |
Saves the residual variances |
Description
Factor analysis aims to find a set of “latent” (or unobservable) variables {z1…zk} that account for the variances and covariances S between a set of p observed variables {x1…xp}. In the terminology of factor analysis, the latent variables {zi} are known as factors. However, they are continuous variables, and thus are represented in Genstat by variate rather than by factor data structures. So to avoid confusion, when we refer to the latent variables below, factor will be printed in italic font.
The data for a factor analysis consists of observed measurements on the variables {xi} made on a set of subjects. The assumption is that, for each subject, the values of the observed variables are related to the factors by a linear model
x = μ + Γ z + ε
where x is the vector of observed variables,
z is the vector of factors,
μ is a vector of means for the observed variables,
Γ is a matrix of loadings defining the relationship between observed and latent variables, and
ε is a vector of residuals.
The elements of the residual vector ε are assumed to have mean zero and to be uncorrelated, i.e. the dispersion matrix of ε is assumed to be diagonal
cov(ε) = Ψ = diag(ψ12, … ψp2)
(They thus differ from the residuals formed in a principal components analysis, which will be correlated; see e.g. Krzanowski 1988 Section 16.2 for more details). The factors themselves are assumed to have variance one and to be uncorrelated, i.e.
cov(z) = I.
So the correlations between the observed variables {xi} arise only through their relations with the factors, and not because of any correlation between the residuals or between the factors.
The DATA
parameter specifies the data for the factor analysis. You can supply either a pointer containing a set of variates, one for each observed variable {xi}, or a matrix storing the observed variables by columns, or a symmetric matrix containing variances and covariances between the variables, or an SSPM structure (formed using FSSPM
from the variates of observed measurements). When DATA
specifies a symmetric matrix of variances and covariances, you must also set the NUNITS
parameter to specify the number of units from which they were calculated if you want FCA
to print tests.
The METHOD
option has settings vcovariance
(with synonym variancecovariance
) and correlation
, to control whether FCA
forms a matrix of variances and covariances or a matrix of correlations for the analysis. The same factors will be obtained if you use a correlation matrix, but the loadings will be scaled to be between zero and one. The number of factors, q, to fit must be specified by the NDIMENSIONS
option. Arising from the numbers of parameters in the model (see Krzanowski 1988 Section 16.2.2) this is subject to the constraint
(p – q)2 ≥ p + q.
The PRINT
option controls printed output, with settings:
communalities |
the proportion of variation explained by the factors for each observed variable, (var(xi) – ψi2) / var(xi); |
---|---|
loadings |
the matrix of factor loadings Γ; |
coefficients |
the factor score coefficients; |
scores |
the factor scores calculated from the model for each subject; |
residuals |
the vectors of residuals ε, |
cresiduals |
the residual correlation or covariance matrix i.e. a symmetric matrix showing the amount of unexplained correlation or covariance between each pair of variables; |
vresiduals |
the residual variances; and |
tests |
a chi-square goodness of fit test for the model. |
By default nothing is printed. Note, however, that scores and residuals cannot be produced when DATA
is set to a symmetric matrix of variances and covariances.
The communalities, factor coefficients, scores, residuals, residual correlations or covariances and residual variances can also be saved using the COMMUNALITIES
, COEFFICIENTS
, SCORES
, RESIDUALS
, CRESIDUALS
and VRESIDUALS
parameters, respectively. The LRV
parameter allows an LRV structure to be saved, with the loadings in the ['vectors']
component, and the eigenvalues of the matrix Ψ-½ S Ψ-½ in the ['roots']
component; the loadings are scaled eigenvectors of Ψ-½ S Ψ-½. (Remember, S is the matrix of variances and covariances of the observed variables {xi}.) The SSPM
parameter can save the SSPM structure constructed from a DATA
pointer for the analysis. A particularly convenient instance is when you have supplied an SSPM structure as input but, for example, have set METHOD=correlation
: the SSPM that is saved will then contain correlations instead of sums of squares and products.
Options: PRINT
, NDIMENSIONS
, METHOD
, MAXCYCLE
, TOLERANCE
.
Parameters: DATA
, NUNITS
, LRV
, SSPM
, COMMUNALITIES
, COEFFICIENTS
, SCORES
, RESIDUALS
, CRESIDUALS
, VRESIDUALS
.
Method
FCA
estimates the parameters of the model by maximum likelihood, assuming multivariate Normality, using subroutines G03CAF
and G03CCF
from the NAG Library. The MAXCYCLE
option sets a limit on the number of iterations (default 50). The TOLERANCE
option specifies the minimum value to assume for the unique component ψi2 of each observed variable so that the communality is always less than one; the default is 10-6.
Action with RESTRICT
If any of the variates in a DATA
pointer is restricted, only the defined subset of the units will be used in the analysis.
References
Krzanowski, W.J. (1988). Principles of Multivariate Analysis: a User’s Perspective. Oxford University Press, Oxford.
See also
Directives: CVA
, MDS
, PCO
, PCP
, ROTATE
, SSPM
.
Procedures: LRVSCREE
, DMST
, PLS
, RIDGE
.
Commands for: Multivariate and cluster analysis.
Example
" Example 2:6.11 " TEXT [VALUES=Gaelic,English,History,Arithmetic,Algebra,Geometry] Subjects SYMMETRICMATRIX [ROWS=Subjects; VALUES=\ 1.000,\ 0.439, 1.000,\ 0.410, 0.351, 1.000,\ 0.288, 0.354, 0.164, 1.000,\ 0.329, 0.320, 0.190, 0.595, 1.000,\ 0.248, 0.329, 0.181, 0.470, 0.464, 1.000] Correlation FCA [PRINT=communalities,loadings,cresiduals,tests; NDIMENSION=2]\ Correlation; NUNITS=220