Relationship between AAᵀ and AᵀA

We look into the relationship between eigenvalues and eigenvectors of AAᵀ and AᵀA for a real-valued matrix A of shape (r x c). Without losing generality, we can assume that the matrix is wide i.e. r < c.

0. Singular value decompositions of matrices

We begin with writing down the SVD of A:

Note that \( \Sigma = [\sigma ; 0] \) i.e. the \( \Sigma \) can be written as a concatenation of a diagonal matrix and a block matrix of zeros. Now we can write down the SVDs of the two matrix products in terms of the SVD of A:\(AA^T = U \ \underbrace{\amber{\Sigma\Sigma^T}}_{\rose{r\times r}} \ U^T\)
Similarly we have:\(A^TA = V \ \underbrace{\amber{\Sigma^T\Sigma}}_{\rose{c\times c}} \ V^T\)It is important to understand what the matrices \( \Sigma\Sigma^T \) and \( \Sigma^T\Sigma \) look like:

Now we are ready to draw some insights.

1. Relationship between eigenvalues

From the singular value decompositions above, the following insight becomes clear:

The sets of non-zero eigenvalues of\( \ AA^T \ \)and\( \ A^TA \ \)are the same.

All these non-zero eigenvalues come from the diagonal matrix \( \sigma^2 \). Remember that \( \Sigma = [\ \underbrace{\sigma}_{\amber{r \times r}} \ ; \ \underbrace{0}_{\amber{r \times c-r}} \ ] \).

2. Relationship between eigenvectors

This part is not immediately obvious. We start by asking: what does the matrix \( A \) map the vector \( v_i \) to? Here \( v_i \) is the i-th column of \( V \) i.e. it is the i-th eigenvector of \( V \).\[ \begin{align} Av_i &= U \ \Sigma \ \underbrace{\amber{V^T v_i}}_{\amber{e_i}} \\ &= U \ \underbrace{\Sigma \ \amber{e_i}}_{\amber{\text{i-th column of } \Sigma}} \\ &= \amber{\sigma_i u_i} \ \text{if } i \lt r \ \text{else} \ \amber{0} \end{align} \]

The matrix \( A \) maps the first \( r \) columns of \( V \) to the first \( r \) columns of \( U \) scaled by the singular values and the remaining columns to zero vector.\[ v_i \rightarrow A \rightarrow \begin{cases} \sigma_i u_i, & 1 \le i \le r \\ 0, & r \lt i \le c \end{cases} \]

Now we can ask the same question for \( A^T \): what does \( A^T \) map the columns of \( U \) to?\[ \begin{align} A^T u_i &= V \ \Sigma^T \ \underbrace{\amber{U^T u_i}}_{\amber{e_i}} \\ &= V \ \underbrace{\Sigma^T \ \amber{e_i}}_{\amber{\text{i-th column of } \Sigma^T}} \\ &= \amber{\sigma_i v_i} \end{align} \]

The matrix \( A^T \) maps the columns of \( U \) to the columns of \( V \) scaled by the singular values.\( u_i \rightarrow A^T \rightarrow \sigma_i v_i, 1 \le i \le r \)

3. Applications

A matrix \( A \in R^{d \times n} \) can be seen as a table of d- dimensional features of n items. If n gets really large, the matrix \( A^TA \in R^{n \times n} \) gets too large to perform operations like eigendecompositon, which are quite common in machine learning. In this case, the eigenvalues and eigenvectors of the large matrix can be calculated using \( AA^T \in R^{d \times d} \) and using the relationships we just derived. For instance this is used in creating a dual form of sampling for determinantal point processes.

Singular Value Decomposition (paywalled)

Eigenvectors of symmetric matrices (paywalled)

Symmetric positive semi-definite matrices (paywalled)