In this section, we present some linear algebra results that are useful in the context of this course. For more information, you can refer to the MAT-1200 course, Deisenroth, Faisal, and Ong (2020) (in English), and Grifone (2024) (in French).
Some matrix properties
Let \(M_{n, m}(\mathbb{R})\) be the set of matrices with \(n\) rows and \(m\) columns whose entries belong to \(\mathbb{R}\). Let \(M_{n}(\mathbb{R})\) be the set of square matrices of size \(n\), i.e., with \(n\) rows and \(n\) columns whose entries belong to \(\mathbb{R}\). Let \(M\), \(N\), and \(P\) be matrices in \(M_{n, m}(\mathbb{R})\). Let \(A\) and \(B\) be matrices in \(M_{n}(\mathbb{R})\). Let \(I_n\) be the identity matrix of size \(n\), i.e., containing \(1\)s on the diagonal and \(0\)s on the elements outside the diagonal. Let \(u\) and \(v\) in \(\mathbb{R}^n\), i.e., column vectors of size \(n\).
Suppose that the matrices \(A\) and \(B\) are invertible. Then the matrix product \(AB\) is invertible and is given by:
\[(AB)^{-1} = B^{-1} A^{-1}.\]
Let \(C = AB\) and \(D = B^{-1} A^{-1}\). Then
\[\begin{align*}
CD &= A B B^{-1} A^{-1} \\
&= A A^{-1} \\
&= I_n
\end{align*}\]
Similarly, we find that \(DC = I_n\). Thus, \(AB\) is invertible and its inverse is given by \(B^{-1} A^{-1}\).
Considering the matrices defined at the beginning of the section, we have:
\(\text{det}(A^\top) = \text{det}(A)\),
\(\text{det}(AB) = \text{det}(A)\text{det}(B)\),
\(\text{det}(A^{-1}) = 1 / \text{det}(A)\).
The proofs of properties \(1\) and \(2\) are technical and are omitted, but can be found, for example, here. As for the third property, by definition, we have \(A A^{-1} = I_n\). The determinant of \(I_n\) is equal to \(1\) (product of the elements on the diagonal). Therefore, \(\text{det}(A A^{-1}) = 1\). However, according to the second property, \(\text{det}(A A^{-1}) = \text{det}(A)\text{det}(A^{-1})\). We therefore have \(\text{det}(A^{-1}) = 1 / \text{det}(A)\).
Considering the matrices defined at the beginning of the section, we have:
\(\text{tr}(A) = \text{tr}(A^{\top})\),
\(\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)\),
\(\text{tr}(MN^{\top}) = \text{tr}(N^{\top}M)\).
For a square matrix \(A\), let \(a_{ij}\) be the element of matrix \(A\) in row \(i\) and column \(j\). The trace of \(A\) is given by the sum of the diagonal elements, i.e. \(\text{tr}(A) = \sum_{i = 1}^{n} a_{ii}\).
Since transposition does not change the diagonal elements, the result is straightforward.
Let \(C = A + B\). Since \(A\) and \(B\) are square matrices, \(C\) is a square matrix. We have \(c_{ij} = a_{ij} + b_{ij}\) for all \(i, j = 1, \dots, n\). Therefore \[\text{tr}(A + B) = \text{tr}(C) = \sum_{i = 1}^{n} c_{ii} = \sum_{i = 1}^{n} a_{ii} + b_{ii} = \sum_{i = 1}^{n} a_{ii} + \sum_{i = 1}^{n} b_{ii} = \text{tr}(A) + \text{tr}(B).\]
The matrices \(M N^{\top}\) and \(N^{\top} M\) are square, with dimensions \(n \times n\) and \(m \times m\) respectively, so we can calculate their traces. Let \(C = M N^{\top}\) and \(D = N^{\top} M\). \[\text{tr}(M N^{\top}) = \text{tr}(C) = \sum_{i = 1}^{n} c_{ii} = \sum_{i = 1}^{n} \sum_{j = 1}^{m} m_{ij} n_{ji} = \sum_{j = 1}^{m} \sum_{i = 1}^{n} n_{ji} m_{ij} = \sum_{j = 1}^{m} d_{jj} = \text{tr}(D) = \text{tr}(N^{\top} M).\]
Let \(A\) be a symmetric matrix in \(M_n(\mathbb{R})\). \(A\) is positive definite if \(u^\top A u > 0\) for all \(u \in \mathbb{R}^n\) such that \(u \neq 0\).
Let \(A \in M_n(\mathbb{R})\). \(A\) is orthogonal if \(A^\top A = A A^\top = I_n\).
Eigenvalues and eigenvectors
Let \(A \in M_n(\mathbb{R})\). We say that \(\lambda \in \mathbb{R}\) is an eigenvalue of \(A\) if there exists a nonzero vector \(u \in \mathbb{R}^n\) such that \[Au = \lambda u. \tag{1}\] The vector \(u\) is called an eigenvector of \(A\) corresponding to the eigenvalue \(\lambda\).
The set of real numbers \(\lambda\) satisfying Equation 1 is called the spectrum of the matrix \(A\) and is denoted by \(\text{sp}(A)\).
If \(u\) is an eigenvector of \(A\) corresponding to an eigenvalue \(\lambda\), then the vector \(cu\), \(c \in \mathbb{R}^\star\) is also an eigenvector of \(A\) corresponding to \(\lambda\).
If \(A\) is symmetric and \(u_{1}\) and \(u_{2}\) are eigenvectors corresponding to different eigenvalues of \(A\), then \(u_{1}\) and \(u_{2}\) are orthogonal, i.e. \(u_{1}^\top u_{2} = 0\).
Let \(c \in \mathbb{R}^\star\) and \(u\) be an eigenvector of \(A\) associated with the eigenvalue \(\lambda\). We have: \[A(cu) = cAu = c \lambda u = \lambda (cu).\] Therefore, the vector \(cu\) is also an eigenvector of \(A\) associated with the eigenvalue \(\lambda\).
Let \(\lambda_{1}\) and \(\lambda_{2}\) be the eigenvalues associated with \(u_{1}\) and \(u_{2}\), such that \(\lambda_{1} \neq \lambda_{2}\). We have \(A u_{1} = \lambda_{1} u_{1}\) and \(A u_{2} = \lambda_{2} u_{2}\). Then \[\lambda_{1} u_{1}^{\top} u_{2} = u_{1}^\top A u_{2} = \lambda_{2} u_{1}^\top u_{2}.\] This implies that \((\lambda_{1} - \lambda_{2})u_{1}^\top u_{2} = 0\). However, \(\lambda_{1} \neq \lambda_{2}\). Therefore, necessarily, \(u_{1}^\top u_{2} = 0\).
This second property will be useful when we look at dimension reduction and, in particular, principal component analysis.
If \(A\) is symmetric, then all its eigenvalues are real.
If \(A\) is positive definite, then all its eigenvalues are strictly positive.
Consider the more general case where \(A\) is a Hermitian matrix. The matrix \(A\) is equal to the transpose of its conjugate, denoted \(A^*\). Let \(\lambda\) be an eigenvalue associated with an eigenvector \(u\), which may be complex. We have: \[\begin{align}
\overline{u}^{\top} A u &= \overline{u}^\top \lambda u = \lambda \overline{u}^{\top} u, \\
\overline{u}^\top A u &= \overline{u}^\top A^* u = \overline{Au}^\top u = \overline{\lambda} \overline{u}^\top u.
\end{align}\] This implies that \((\lambda - \overline{\lambda}) \overline{u}^{\top} u = 0\). Since \(u \neq 0\), we have \(\lambda = \overline{\lambda}\). Therefore, \(\lambda \in \mathbb{R}\).
Consider \(u\), an eigenvector of \(A\) associated with the eigenvalue \(\lambda\). We have that \(u^{\top} A u = \lambda u^{\top} u\). However, since \(u \neq 0\), \(u^{\top}u \neq 0\). Therefore, \[\lambda = \frac{u^{\top} A u}{u^{\top} u}.\] Since \(A\) is positive definite, \(u^{\top} A u > 0\) for all nonzero vectors \(u\). We can therefore deduce that \(\lambda > 0\).
Diagonalization of matrices
Let \(A \in M_n(\mathbb{R})\). We say that \(A\) is diagonalizable if there exists a non-singular matrix \(P \in M_n(\mathbb{R})\) and a diagonal matrix \(D \in M_n(\mathbb{R})\) such that \[P^{-1} A P = D \Longleftrightarrow A = P D P^{-1}.\]
Let \(A\) be a symmetric matrix in \(M_n(\mathbb{R})\) and \(\lambda_{1}, \dots, \lambda_n\) its \(n\) eigenvalues. Then there exists an orthogonal matrix \(P\) in \(M_n(\mathbb{R})\) such that \[A = P \Lambda P^\top, \quad\text{where}\quad \Lambda = \text{diag}(\lambda_1, \dots, \lambda_n).\]
If \(A\) has \(n\) distinct positive eigenvalues, then we can take \(P\) to be the matrix whose \(k\)th column is the normalized eigenvector corresponding to the \(k\)th eigenvalue \(\lambda_k\) of \(A\).
Given two symmetric matrices, \(A\) and \(B\), how can we determine the vector \(u\) such that \(u^{\top} A u\) is maximal, knowing that \(u^{\top} B u = 1\)? We simply take \(u\) as the eigenvector of \(B^{-1}A\) associated with \(\lambda\), the maximal eigenvalue of \(B^{-1}A\). We thus obtain \[u^{\top} A u = u^{\top}\lambda M u = \lambda U^{\top} M u = \lambda.\]
If \(A\) has eigenvalues (real, but not necessarily distinct) \(\lambda_{1}, \dots, \lambda_{n}\), then
\(\text{det}(A) = \prod_{i = 1}^{n} \lambda_i\)
\(\text{tr}(A) = \sum_{i = 1}^{n} \lambda_i.\)
Using the spectral decomposition theorem, there exists an invertible matrix \(P\) such that \(A = P \Lambda P^{-1}\), where \(\Lambda\) is a diagonal matrix containing the eigenvalues. We therefore have, for the determinant,
\[\text{det}(A) = \text{det}(P \Lambda P^{-1}) = \text{det}(P)\text{det}(\Lambda)\text{det}(P^{-1}) = \text{det}(P)\text{det}(\Lambda)\text{det}(P)^{-1} = \text{det}(\lambda) = \prod_{i = 1}^{n} \lambda_i, \]
and, for the trace,
\[\text{tr}(A) = \text{tr}(P \Lambda P^{-1}) = \text{tr}(P^{-1} P \Lambda) = \text{tr}(\Lambda) = \sum_{i = 1}^{n} \lambda_i.\]
References
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. 2020.
Mathematics for Machine Learning. 1st ed. Cambridge University Press.
https://doi.org/10.1017/9781108679930.
Grifone, Joseph. 2024. Algèbre Linéaire. 7e edition. Toulouse: CEPADUES.