Multiple correspondence analysis
Multiple correspondence analysis (MCA) can be presented as an extension of PCA. It allows for the graphical representation of frequency tables containing more than two variables. A classic example of a frequency table with more than two qualitative variables is a table presenting individuals’ responses to a questionnaire containing \(Q\) multiple-choice questions. MCA is therefore very useful for visualizing the results of a questionnaire-based study.
MCA can also be seen as a version of PCA when the variables are mixed, i.e., comprising both quantitative and qualitative variables. The joint processing of these two types of data is based on their prior transformation called complete disjunctive coding.
Notation
Let \(n\) be the number of individuals (or observations) and \(Q\) the number of variables (or questions in the case of a questionnaire). Each variable has \(J_q\) modalities and the total number of modalities is equal to \(J\).
Thus, a variable is not treated as such but through its modalities. It is divided into modalities, and each individual is then coded \(1\) for the modality they possess and \(0\) for the others (i.e., those they do not possess, as the modalities are exclusive). This coding is immediate for qualitative variables. However, for a qualitative variable, we first divide the variable into classes. Thus, each individual belongs to only one class. This process of transforming information is called complete disjunctive coding. It is indeed coding, because the initial information is transformed; disjunctive, because each individual has at most one modality; and complete, because each individual has at least one modality.
Burt’s Table
From a mathematical point of view, ACM is an AFC performed on the logical matrix \(Z\) or on Burt’s table \(B\). It can be shown that the same factors are obtained, regardless of the matrix used for the analysis.
Eigenelements of the table \(Z\)
The eigenelements of the table \(Z\) can be calculated using the same method as for PCA. By analogy with PCA, we therefore seek the eigenvectors of the matrix \[S = \frac{1}{Q} Z^{\top} Z D_J^{-1},\] where \(D_J\) is the diagonal matrix of term \(n_j, j = 1, \dots, J\). We can calculate the coordinates of the line profiles on the factorial axes in the same way: \[\Phi_k = n Z D_J^{-1} u_k,\] where \(u_k\) is the \(k\)th eigenvector associated with the eigenvalue \(\lambda_k\) of the matrix \(S\).
We can also look at the dual analysis of the table \(Z\). Again, by analogy with PCA, we look for the eigenvectors of the matrix \[T = \frac{1}{Q} Z D_J^{-1} Z^{\top}.\] Similarly, we can calculate the coordinates of the column profiles on the factorial axes: \[\Psi_k = n D_J^{-1} Z^{\top} v_k.\]
Eigenelements of Burt’s table \(B\)
Since Burt’s table is symmetric, the direct analysis and the dual analysis coincide. We can also analyze it in analogy with PCA. The sum of the elements of the same row (or the same column) of \(B\) is \(Q n_j\) and the sum of the elements of \(B\) is \(n Q^2\). We are looking for the eigenvectors of the matrix \[S^\prime = \frac{1}{Q^2} B^{\top} D_J^{-1} B D_J^{-1}.\]
We then notice that this matrix \(S^\prime\) has the same eigenvectors as the matrix \(S\). Indeed, \[S^\prime = \frac{1}{Q^2} B^{\top} D_J^{-1} B D_J^{-1} = \frac{1}{Q^2} Z^{\top} Z D_J^{-1} Z^{\top} Z D_J^{-1}.\] And let \(u\) and \(\lambda\) satisfy \(Z^{\top} Z D_J^{-1} = \lambda u\), then \[ Z^{\top} Z D_J^{-1} Z^{\top} Z D_J^{-1} u = Z^{\top} Z D_J^{-1} \lambda u = \lambda^2 u.\]
Finally, the analysis of \(Z\) or \(B\) provides the same eigenvectors, and for all \(k = 1, \dots, Q\), the \(k\)th eigenvalue of \(B\) is the square of the \(k\)th eigenvalue of \(Z\).
Variable encoding
Variable encoding, and in particular the choice of class boundaries, is essential in ACM. For continuous variables, the boundaries should be relevant to the problem being studied. For example, we would not define a class \(> 1000\) in the previous example. To obtain relevant bounds, we can look at the distributions of the variables, e.g., with a histogram. In some specific cases, it is possible to divide the variable into equal-sized categories. However, this approach can lead to irrelevant categories.
In the case of qualitative variables, the choice of classes does not arise; it is given by the variable. However, “natural” modalities can lead to (very) unbalanced frequencies. In this case, we generally need to proceed with groupings. Here again, a good knowledge of the field being studied is necessary. In any case, it is preferable to group modalities rather than randomly distribute modalities with low frequencies among the other modalities (which is sometimes proposed in software).
