Linear Algebra

In this section, we present some linear algebra results that are useful in the context of this course. For more information, you can refer to the MAT-1200 course, Deisenroth, Faisal, and Ong (2020) (in English), and Grifone (2024) (in French).

Some matrix properties

Let \(M_{n, m}(\mathbb{R})\) be the set of matrices with \(n\) rows and \(m\) columns whose entries belong to \(\mathbb{R}\). Let \(M_{n}(\mathbb{R})\) be the set of square matrices of size \(n\), i.e., with \(n\) rows and \(n\) columns whose entries belong to \(\mathbb{R}\). Let \(M\), \(N\), and \(P\) be matrices in \(M_{n, m}(\mathbb{R})\). Let \(A\) and \(B\) be matrices in \(M_{n}(\mathbb{R})\). Let \(I_n\) be the identity matrix of size \(n\), i.e., containing \(1\)s on the diagonal and \(0\)s on the elements outside the diagonal. Let \(u\) and \(v\) in \(\mathbb{R}^n\), i.e., column vectors of size \(n\).

Properties of the inverse of matrices

Suppose that the matrices \(A\) and \(B\) are invertible. Then the matrix product \(AB\) is invertible and is given by:

\[(AB)^{-1} = B^{-1} A^{-1}.\]

Proof

Let \(C = AB\) and \(D = B^{-1} A^{-1}\). Then

\[\begin{align*} CD &= A B B^{-1} A^{-1} \\ &= A A^{-1} \\ &= I_n \end{align*}\]

Similarly, we find that \(DC = I_n\). Thus, \(AB\) is invertible and its inverse is given by \(B^{-1} A^{-1}\).

Properties of the determinant of matrices

Considering the matrices defined at the beginning of the section, we have:

\(\text{det}(A^\top) = \text{det}(A)\),
\(\text{det}(AB) = \text{det}(A)\text{det}(B)\),
\(\text{det}(A^{-1}) = 1 / \text{det}(A)\).

Proof

The proofs of properties \(1\) and \(2\) are technical and are omitted, but can be found, for example, here. As for the third property, by definition, we have \(A A^{-1} = I_n\). The determinant of \(I_n\) is equal to \(1\) (product of the elements on the diagonal). Therefore, \(\text{det}(A A^{-1}) = 1\). However, according to the second property, \(\text{det}(A A^{-1}) = \text{det}(A)\text{det}(A^{-1})\). We therefore have \(\text{det}(A^{-1}) = 1 / \text{det}(A)\).

Properties of the trace of matrices

Considering the matrices defined at the beginning of the section, we have:

\(\text{tr}(A) = \text{tr}(A^{\top})\),
\(\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)\),
\(\text{tr}(MN^{\top}) = \text{tr}(N^{\top}M)\).

Proof

For a square matrix \(A\), let \(a_{ij}\) be the element of matrix \(A\) in row \(i\) and column \(j\). The trace of \(A\) is given by the sum of the diagonal elements, i.e. \(\text{tr}(A) = \sum_{i = 1}^{n} a_{ii}\).

Since transposition does not change the diagonal elements, the result is straightforward.
Let \(C = A + B\). Since \(A\) and \(B\) are square matrices, \(C\) is a square matrix. We have \(c_{ij} = a_{ij} + b_{ij}\) for all \(i, j = 1, \dots, n\). Therefore \[\text{tr}(A + B) = \text{tr}(C) = \sum_{i = 1}^{n} c_{ii} = \sum_{i = 1}^{n} a_{ii} + b_{ii} = \sum_{i = 1}^{n} a_{ii} + \sum_{i = 1}^{n} b_{ii} = \text{tr}(A) + \text{tr}(B).\]
The matrices \(M N^{\top}\) and \(N^{\top} M\) are square, with dimensions \(n \times n\) and \(m \times m\) respectively, so we can calculate their traces. Let \(C = M N^{\top}\) and \(D = N^{\top} M\). \[\text{tr}(M N^{\top}) = \text{tr}(C) = \sum_{i = 1}^{n} c_{ii} = \sum_{i = 1}^{n} \sum_{j = 1}^{m} m_{ij} n_{ji} = \sum_{j = 1}^{m} \sum_{i = 1}^{n} n_{ji} m_{ij} = \sum_{j = 1}^{m} d_{jj} = \text{tr}(D) = \text{tr}(N^{\top} M).\]

Definition

Let \(A\) be a symmetric matrix in \(M_n(\mathbb{R})\). \(A\) is positive definite if \(u^\top A u > 0\) for all \(u \in \mathbb{R}^n\) such that \(u \neq 0\).
Let \(A \in M_n(\mathbb{R})\). \(A\) is orthogonal if \(A^\top A = A A^\top = I_n\).

Eigenvalues and eigenvectors

Definition

Let \(A \in M_n(\mathbb{R})\). We say that \(\lambda \in \mathbb{R}\) is an eigenvalue of \(A\) if there exists a nonzero vector \(u \in \mathbb{R}^n\) such that \[Au = \lambda u. \tag{1}\] The vector \(u\) is called an eigenvector of \(A\) corresponding to the eigenvalue \(\lambda\).

The set of real numbers \(\lambda\) satisfying Equation 1 is called the spectrum of the matrix \(A\) and is denoted by \(\text{sp}(A)\).

Property of eigenvectors

If \(u\) is an eigenvector of \(A\) corresponding to an eigenvalue \(\lambda\), then the vector \(cu\), \(c \in \mathbb{R}^\star\) is also an eigenvector of \(A\) corresponding to \(\lambda\).
If \(A\) is symmetric and \(u_{1}\) and \(u_{2}\) are eigenvectors corresponding to different eigenvalues of \(A\), then \(u_{1}\) and \(u_{2}\) are orthogonal, i.e. \(u_{1}^\top u_{2} = 0\).

Proof

Let \(c \in \mathbb{R}^\star\) and \(u\) be an eigenvector of \(A\) associated with the eigenvalue \(\lambda\). We have: \[A(cu) = cAu = c \lambda u = \lambda (cu).\] Therefore, the vector \(cu\) is also an eigenvector of \(A\) associated with the eigenvalue \(\lambda\).
Let \(\lambda_{1}\) and \(\lambda_{2}\) be the eigenvalues associated with \(u_{1}\) and \(u_{2}\), such that \(\lambda_{1} \neq \lambda_{2}\). We have \(A u_{1} = \lambda_{1} u_{1}\) and \(A u_{2} = \lambda_{2} u_{2}\). Then \[\lambda_{1} u_{1}^{\top} u_{2} = u_{1}^\top A u_{2} = \lambda_{2} u_{1}^\top u_{2}.\] This implies that \((\lambda_{1} - \lambda_{2})u_{1}^\top u_{2} = 0\). However, \(\lambda_{1} \neq \lambda_{2}\). Therefore, necessarily, \(u_{1}^\top u_{2} = 0\).

This second property will be useful when we look at dimension reduction and, in particular, principal component analysis.

Characterization of matrices with their eigenvalues

If \(A\) is symmetric, then all its eigenvalues are real.
If \(A\) is positive definite, then all its eigenvalues are strictly positive.

Proof

Consider the more general case where \(A\) is a Hermitian matrix. The matrix \(A\) is equal to the transpose of its conjugate, denoted \(A^*\). Let \(\lambda\) be an eigenvalue associated with an eigenvector \(u\), which may be complex. We have: \[\begin{align} \overline{u}^{\top} A u &= \overline{u}^\top \lambda u = \lambda \overline{u}^{\top} u, \\ \overline{u}^\top A u &= \overline{u}^\top A^* u = \overline{Au}^\top u = \overline{\lambda} \overline{u}^\top u. \end{align}\] This implies that \((\lambda - \overline{\lambda}) \overline{u}^{\top} u = 0\). Since \(u \neq 0\), we have \(\lambda = \overline{\lambda}\). Therefore, \(\lambda \in \mathbb{R}\).
Consider \(u\), an eigenvector of \(A\) associated with the eigenvalue \(\lambda\). We have that \(u^{\top} A u = \lambda u^{\top} u\). However, since \(u \neq 0\), \(u^{\top}u \neq 0\). Therefore, \[\lambda = \frac{u^{\top} A u}{u^{\top} u}.\] Since \(A\) is positive definite, \(u^{\top} A u > 0\) for all nonzero vectors \(u\). We can therefore deduce that \(\lambda > 0\).

Diagonalization of matrices

Definition

Let \(A \in M_n(\mathbb{R})\). We say that \(A\) is diagonalizable if there exists a non-singular matrix \(P \in M_n(\mathbb{R})\) and a diagonal matrix \(D \in M_n(\mathbb{R})\) such that \[P^{-1} A P = D \Longleftrightarrow A = P D P^{-1}.\]

Spectral decomposition theorem

Let \(A\) be a symmetric matrix in \(M_n(\mathbb{R})\) and \(\lambda_{1}, \dots, \lambda_n\) its \(n\) eigenvalues. Then there exists an orthogonal matrix \(P\) in \(M_n(\mathbb{R})\) such that \[A = P \Lambda P^\top, \quad\text{where}\quad \Lambda = \text{diag}(\lambda_1, \dots, \lambda_n).\]

If \(A\) has \(n\) distinct positive eigenvalues, then we can take \(P\) to be the matrix whose \(k\)th column is the normalized eigenvector corresponding to the \(k\)th eigenvalue \(\lambda_k\) of \(A\).

Given two symmetric matrices, \(A\) and \(B\), how can we determine the vector \(u\) such that \(u^{\top} A u\) is maximal, knowing that \(u^{\top} B u = 1\)? We simply take \(u\) as the eigenvector of \(B^{-1}A\) associated with \(\lambda\), the maximal eigenvalue of \(B^{-1}A\). We thus obtain \[u^{\top} A u = u^{\top}\lambda M u = \lambda U^{\top} M u = \lambda.\]

Characterization of the determinant and trace of matrices with their eigenvalues

If \(A\) has eigenvalues (real, but not necessarily distinct) \(\lambda_{1}, \dots, \lambda_{n}\), then

\(\text{det}(A) = \prod_{i = 1}^{n} \lambda_i\)
\(\text{tr}(A) = \sum_{i = 1}^{n} \lambda_i.\)

Proof

Using the spectral decomposition theorem, there exists an invertible matrix \(P\) such that \(A = P \Lambda P^{-1}\), where \(\Lambda\) is a diagonal matrix containing the eigenvalues. We therefore have, for the determinant,

\[\text{det}(A) = \text{det}(P \Lambda P^{-1}) = \text{det}(P)\text{det}(\Lambda)\text{det}(P^{-1}) = \text{det}(P)\text{det}(\Lambda)\text{det}(P)^{-1} = \text{det}(\lambda) = \prod_{i = 1}^{n} \lambda_i, \]

and, for the trace,

\[\text{tr}(A) = \text{tr}(P \Lambda P^{-1}) = \text{tr}(P^{-1} P \Lambda) = \text{tr}(\Lambda) = \sum_{i = 1}^{n} \lambda_i.\]

References

Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. 2020. Mathematics for Machine Learning. 1st ed. Cambridge University Press. https://doi.org/10.1017/9781108679930.

Grifone, Joseph. 2024. Algèbre Linéaire. 7e edition. Toulouse: CEPADUES.