Relationship between AAᵀ and AᵀA
We look into the relationship between eigenvalues and eigenvectors of AAᵀ and AᵀA for a real-valued matrix A of shape (r x c). Without losing generality, we can assume that the matrix is wide i.e. r < c.
0. Singular value decompositions of matrices
We begin with writing down the SVD of A:
Note that \( \Sigma = [\sigma ; 0] \) i.e. the \( \Sigma \) can be written as a concatenation of a diagonal matrix and a block matrix of zeros. Now we can write down the SVDs of the two matrix products in terms of the SVD of A:\(AA^T = U \ \underbrace{\amber{\Sigma\Sigma^T}}_{\rose{r\times r}} \ U^T\)
Similarly we have:\(A^TA = V \ \underbrace{\amber{\Sigma^T\Sigma}}_{\rose{c\times c}} \ V^T\)It is important to understand what the matrices \( \Sigma\Sigma^T \) and \( \Sigma^T\Sigma \) look like:
Similarly we have:\(A^TA = V \ \underbrace{\amber{\Sigma^T\Sigma}}_{\rose{c\times c}} \ V^T\)It is important to understand what the matrices \( \Sigma\Sigma^T \) and \( \Sigma^T\Sigma \) look like:
Now we are ready to draw some insights.
1. Relationship between eigenvalues
From the singular value decompositions above, the following insight becomes clear:
The sets of non-zero eigenvalues of\( \ AA^T \ \)and\( \ A^TA \ \)are the same.
All these non-zero eigenvalues come from the diagonal matrix \( \sigma^2 \). Remember that \( \Sigma = [\ \underbrace{\sigma}_{\amber{r \times r}} \ ; \ \underbrace{0}_{\amber{r \times c-r}} \ ] \). 2. Relationship between eigenvectors
This part is not immediately obvious. We start by asking: what does the matrix \( A \) map the vector \( v_i \) to? Here \( v_i \) is the i-th column of \( V \) i.e. it is the i-th eigenvector of \( V \).\[
\begin{align}
Av_i &= U \ \Sigma \ \underbrace{\amber{V^T v_i}}_{\amber{e_i}} \\
&= U \ \underbrace{\Sigma \ \amber{e_i}}_{\amber{\text{i-th column of } \Sigma}} \\
&= \amber{\sigma_i u_i} \ \text{if } i \lt r \ \text{else} \ \amber{0}
\end{align}
\]
The matrix \( A \) maps the first \( r \) columns of \( V \) to the first \( r \) columns of \( U \) scaled by the singular values and the remaining columns to zero vector.\[
v_i \rightarrow A \rightarrow
\begin{cases}
\sigma_i u_i, & 1 \le i \le r \\
0, & r \lt i \le c
\end{cases}
\]
Now we can can the same question for \( A^T \): what does \( A^T \) map the columns of \( U \) to?\[
\begin{align}
A^T u_i &= V \ \Sigma^T \ \underbrace{\amber{U^T u_i}}_{\amber{e_i}} \\
&= V \ \underbrace{\Sigma^T \ \amber{e_i}}_{\amber{\text{i-th column of } \Sigma^T}} \\
&= \amber{\sigma_i v_i}
\end{align}
\]
The matrix \( A^T \) maps the columns of \( U \) to the columns of \( V \) scaled by the singular values.\( u_i \rightarrow A^T \rightarrow \sigma_i v_i, 1 \le i \le r \)
3. Applications
A matrix \( A \in R^{d \times n} \) can be seen as a table of d- dimensional features of n items. If n gets really large, the matrix \( A^TA \in R^{n \times n} \) gets too large to perform operations like eigendecompositon, which are quite common in machine learning. In this case, the eigenvalues and eigenvectors of the large matrix can be calculated using \( AA^T \in R^{d \times d} \) and using the relationships we just derived. For instance this is used in creating a dual form of sampling for determinantal point processes.
Related articles:
Singular Value Decomposition (paywalled)
Eigenvectors of symmetric matrices (paywalled)
Symmetric positive semi-definite matrices (paywalled)