Doubly stochastic matrices

For n=3, we have 3! = 6 permutation matrices and 6 permutation vectors.
Here P is a permutation matrix and D is a doubly stochastic matrix.

Consider all possible n! permutations of a n-dimensional vector x. This gives us n! permutation vectors in Rⁿ and n! permutation matrices in R^{(n x n)}. Each of these vectors is equal to \( Px \) for each permutation matrix \( P \):

import torch

# Permutation matrices (vertices of Birkhoff polytope)
P1 = torch.eye(3)
P2 = torch.Tensor([[0,1,0],[1,0,0],[0,0,1]])
P3 = torch.Tensor([[0,0,1],[0,1,0],[1,0,0]])
P4 = torch.Tensor([[1,0,0],[0,0,1],[0,1,0]])
P5 = torch.Tensor([[0,0,1],[1,0,0],[0,1,0]])
P6 = torch.Tensor([[0,1,0],[0,0,1],[1,0,0]])

P_full = torch.stack([P1, P2, P3, P4, P5, P6], dim=-1)  # (3,3,6)
to_permute = torch.Tensor([1,2,3])

vertices = torch.stack([P_full[...,i] @ to_permute for i in range(6)])
print(vertices)

### Output ###:
tensor([[1., 2., 3.],
        [2., 1., 3.],
        [3., 2., 1.],
        [1., 3., 2.],
        [3., 1., 2.],
        [2., 3., 1.]])

We saw that any point inside the Birkhoff polytope is a doubly stochastic matrix D which be seen as a convex sum of permutation matrices. The corresponding convex sum of permutations of x is given by \( Dx \):

x = torch.randn(6).softmax(dim=-1)                      #    (,6)

# point is a doubly stochastic matrix
point = (P_full * x).sum(dim=-1)                        # (3,3,6) => (3,3)

# (1,6) @ (6,3) -> (1,3) -> 3
print(x @ vertices)         # convex sum of permutations of "to_permute"

# (3,3) @ (3,1) -> (3,1) -> 3
print(point @ to_permute)   # same as (x @ vertices)

### Output ###:
tensor([2.2665, 1.8233, 1.9102])
tensor([2.2665, 1.8233, 1.9102])

This leads to the following property:

Each point y in the convex hull of all permutations of vector x is the product Dx for some doubly stochastic matrix D: y = Dx

To appreciate its significance, we need to understand a few properties of doubly stochastic matrices.

0. Properties of doubly stochastic matrices

0.1. Product preserves vector sum

For a doubly stochastic matrix D if y = Dx, we have 1^Ty = 1^Tx. It is easy to see that

\( \underbrace{1^T\rose{y}}_{\amber{\text{sum of y}}} = 1^T(\rose{Dx}) = (\underbrace{\amber{1^TD}}_{\amber{\text{sum of cols}}})x = \underbrace{1^Tx}_{\amber{\text{sum of x}}} \).

0.2. Input majorizes output

Given two arrays a and b of equal length n, we say that a majorizes b if:\[ \rose{sum}(\text{i largest values of a}) \begin{cases} \ge \rose{sum}(\text{i largest values of b}), & i \lt n \\ = \rose{sum}(\text{i largest values of b}), & i = n \end{cases} \]In other words if \( a_1 \ge a_2 \ge ... a_n \) and \( b_1 \ge b_2 \ge ... b_n \) then the graph of cumulative sums of \( a \) is above the graph of cumulative sums of \( b \) and they intersect at the \( n \)-th point.

Given a doubly stochastic matrix D and a vector x, the input vector x majorizes the output vector y = Dx.

An example is shown below for permutation matrices and doubly stochastic matrices for n=3:

# refer to the code above for P_full
doubly_stochastic_matrix = (P_full * x).sum(dim=-1)
# pick a random vector which sums to 1
vector_in = torch.randn(3).softmax(dim=-1)
vector_out = doubly_stochastic_matrix @ vector_in
vector_in_sorted_cumsum = vector_in.sort(descending=True).values.cumsum(dim=-1)
vector_out_sorted_cumsum = vector_out.sort(descending=True).values.cumsum(dim=-1)
# condition for majorization
assert torch.all(vector_in_sorted_cumsum >= vector_out_sorted_cumsum)
print(vector_in_sorted_cumsum)
print(vector_out_sorted_cumsum)

### Output ###:
tensor([0.7674, 0.9329, 1.0000])
tensor([0.4229, 0.7315, 1.0000])

This property can be interpreted as "spreading out" the values of the input vector while keeping its sum the same. The limit of this "spreading out" is the vector with all equal values. In the example below, we use a different doubly stochastic matrix to show this property:

# start with a vector with all weight concentrated in one dimension
y = torch.Tensor([1,0,0])
for _ in range(10):
    # create a new doubly stochastic matrix
    x = torch.randn(6).softmax(dim=-1)
    doubly_stochastic_matrix = (P_full * x).sum(dim=-1)
    y = doubly_stochastic_matrix @ y
    print(y)

### Output ###:
tensor([0.2121, 0.3339, 0.4540])
tensor([0.3614, 0.3435, 0.2951])
tensor([0.3218, 0.3403, 0.3379])
tensor([0.3346, 0.3360, 0.3294])
tensor([0.3325, 0.3334, 0.3342])
tensor([0.3333, 0.3330, 0.3337])
tensor([0.3333, 0.3333, 0.3334])
tensor([0.3333, 0.3333, 0.3333])
tensor([0.3333, 0.3333, 0.3333])
tensor([0.3333, 0.3333, 0.3333])

1. Muirhead's inequality

Muirhead's inequality concerns two vectors of the same length. It involves the mean of terms of the form:

\( \frac{1}{3!} \)(\( \amber{x}^\zinc{a}\amber{y}^\zinc{b}\amber{z}^\zinc{c} \)+\( \amber{x}^\zinc{a}\amber{z}^\zinc{b}\amber{y}^\zinc{c} \)+\( \amber{y}^\zinc{a}\amber{x}^\zinc{b}\amber{z}^\zinc{c} \)+\( \amber{z}^\zinc{a}\amber{x}^\zinc{b}\amber{y}^\zinc{c} \)+\( \amber{y}^\zinc{a}\amber{z}^\zinc{b}\amber{x}^\zinc{c} \)+\( \amber{z}^\zinc{a}\amber{y}^\zinc{b}\amber{x}^\zinc{c} \))

It can be seen as a function of two vectors a and x of equal length n:\[ \mu(a, x) = \frac{1}{n!} \sum_{\sigma \in S_n} x_{\sigma(1)}^{\zinc{a_1}} \cdots x_{\sigma(n)}^{\zinc{a_n}} \]for all permutations \( \sigma \in S_n \).

1.1. Connection with majorization

Muirhead's inequality states that if \( b \) majorizes \( a \) then:\( \; \mu(a, x) \le \mu(b, x) \; \forall \; x>0 \Rightarrow \amber{[a]} \le \amber{[b]} \). This implies existence of a doubly stochastic matrix \( D \) such that \( Db = a \) since input majorizes output for a doubly stochastic matrix.

The below example has been taken from here .
For example, [4,0,0] majorizes [2,1,1]. This means:

\( x^4 + y^4 + z^4 \ge x^2yz + xy^2z + xyz^2 \)

\( \begin{bmatrix} 2 \\ 1 \\ 1 \end{bmatrix} \) = \( \begin{bmatrix} 0.5 & 0.17 & 0.33 \\ 0.25 & 0.75 & 0 \\ 0.25 & 0.08 & 0.67 \end{bmatrix} \)\( \begin{bmatrix} 4 \\ 0 \\ 0 \end{bmatrix} \) where the matrix is doubly stochastic.

Once can ask - how are these two points linked? The link becomes clear once you write the doubly stochastic matrix as a convex sum of permutation matrices. The coefficients of the convex sum are the link to the first expression:\( \begin{bmatrix} 0.5 & 0.17 & 0.33 \\ 0.25 & 0.75 & 0 \\ 0.25 & 0.08 & 0.67 \end{bmatrix} \)=\( \amber{\frac{1}{12}} \)\( \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \)+\( \amber{\frac{2}{12}} \)\( \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 3 \end{bmatrix} \)+\( \amber{\frac{3}{12}} \)\( \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix} \)+\( \amber {\frac{6}{12}} \)\( \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \)

Using weighted AM-GM inequality with these coefficients gives:
\( \amber{\frac{1}{12}} x^4 + \amber{\frac{2}{12}} x^4 + \amber{\frac{3}{12}} y^4 + \amber{\frac{6}{12}} z^4 \ge (x^{4/12})(x^{8/12})(y^{12/12})(y^{24/12}) = xyz^2 \)
\( \amber{\frac{1}{12}} z^4 + \amber{\frac{2}{12}} z^4 + \amber{\frac{3}{12}} x^4 + \amber{\frac{6}{12}} y^4 \ge (z^{4/12})(z^{8/12})(x^{12/12})(y^{24/12}) = zxy^2 \)
\( \amber{\frac{1}{12}} y^4 + \amber{\frac{2}{12}} y^4 + \amber{\frac{3}{12}} z^4 + \amber{\frac{6}{12}} x^4 \ge (y^{4/12})(y^{8/12})(z^{12/12})(x^{24/12}) = yzx^2 \)

Summing these up gives us the first expression. Given a doubly stochastic matrix, how can one express it as a convex sum of permutation matrices? This is what we explore next.

← 3) Birkhoff polytope

5) Birkhoff-von-Neumann algorithm →