Birkhoff-von Neumann theorem
The Birkhoff-von Neumann theorem states that every doubly stochastic matrix can be written as a convex sum of permutation matrices. Here we look at the algorithm to find such a convex sum. The idea is focused on the number of zeros in the matrix.
0. Zeros in a doubly stochastic matrix
We start with a few observations:
1. Each permutation matrix has exactly one non-zero entry in each row and each column.
2. Number of zeros in a doubly stochastic matrix is maximum when it is a permutation matrix. In other words, a n-by-n doubly stochastic matrix cannot have fewer than n non-zero entries, which happen when the matrix is a permutation matrix.
Number of zeros = 6
Interpolations between pairs of permutation matrices (P₁, P₃), (P₃, P₅) and (P₅, P₁).
This forms the basis for the Birkhoff-von Neumann theorem, which states that every doubly stochastic matrix can be written as a convex sum of permutation matrices. The idea is to increase the number of zeros in the matrix gradually. We use induction on the number of zeros in the matrix to prove it while also generating a convex sum.
0.1. Outline of the proof
We begin with the following observation: in any doubly stochastic matrix of size n x n, one can always pick n non-zero entries such that each row and each column contains exactly one of these entries. Another way to say it is:
For any doubly stochastic matrix D, there exists a permutation matrix P such that \( P_{ij} \ne 0 \Rightarrow D_{ij} \ne 0 \).
An example is shown below (taken from Wikipedia ):\(
\underbrace{\begin{bmatrix} \text{7/12} & 0 & \amber{\boxed{\text{5/12}}} \\ \amber{\boxed{\text{2/12}}} & \text{6/12} & \text{4/12} \\ \text{3/12} & \amber{\boxed{\text{6/12}}} & \text{3/12} \end{bmatrix}}_{
\begin{matrix}
\zinc{ \text{3 non-zero values}} \\
\zinc{\text{corresponding to the permutation matrix}} \\
\Large
\begin{bmatrix} 0 & 0 & \amber{\boxed{1}} \\ \amber{\boxed{1}} & 0 & 0 \\ 0 & \amber{\boxed{1}} & 0 \end{bmatrix}
\end{matrix}
}
\)
The proof of it comes from Hall's Marriage Theorem . One can create a bipartite graph where the rows form one set of vertices and the columns form the other set of vertices. An edge exists between a row and a column if the corresponding entry in the matrix is non-zero. It can be shown that there exists a perfect matching in this graph for a doubly stochastic matrix.
0.2. Example
The algorithm finds a matching that includes the smallest element of the matrix and its corresponding permutation matrix. Subtracting the permutation matrix scaled by the smallest element results in a new matrix with at least one more zero than the original matrix. This new matrix also has the same row and column sums, and is therefore a scaled doubly stochastic matrix. An example for this procedure for the above matrix is shown below:
\(
\underbrace{\begin{bmatrix}
\text{7/12} & 0 & \amber{\boxed{\text{5/12}}} \\
\underbrace{\amber{\boxed{\text{2/12}}}}_{\zinc{\text{smallest element}}} & \text{6/12} & \text{4/12} \\
\text{3/12} & \amber{\boxed{\text{6/12}}} & \text{3/12}
\end{bmatrix}}_{\rose{\text{number of zeros: 1}}}
- \underbrace{\amber{\frac{2}{12}}}_{\zinc{\text{smallest element}}}
\begin{bmatrix}
0 & 0 & \amber{\boxed{1}} \\
\amber{\boxed{1}} & 0 & 0 \\
0 & 0 & \amber{\boxed{1}}
\end{bmatrix}
= \frac{10}{12}
\underbrace{\begin{bmatrix}
\text{7/10} & 0 & \text{3/10} \\
0 & \text{6/10} & \text{4/10} \\
\text{3/10} & \text{4/10} & \text{3/10}
\end{bmatrix}}_{\rose{\text{number of zeros: 2}}}
\)Thus, we were able to express a doubly stochastic matrix as a sum of another (scaled) doubly stochastic matrix and a (scaled) permutation matrix. Using induction, we can show that continuing this process will eventually yield a convex sum of permutation matrices (how can you use induction to show that the sum of coefficients is 1?).
Now we look into how to convert (almost) any non-negative square matrix into a doubly stochastic matrix. This is where the Sinkhorn-Knopp algorithm comes into play.