Simplex Distribution Maximum Values

I tried simulating from a simplex distribution (simple stan code below) and then picked one of the variables and found its maximum in the posterior distribution. I was surprised to find these numbers relatively low, peaking around 0.4 to 0.5 for a 10-dimensional simplex, rather than going up to 1 or even 0.9 or something like that.

My sense is that the simplex distribution tends to have average levels of the individual variables around 1 / K where K is the number of dimensions in the simplex. And also the more dimensions of the simplex, the less likely that any individual value is going to become extreme. Can someone explain why this is the case?


data {
     int<lower=0> N;
}
parameters {
	simplex[N] w;
}
model {
}

@betanalpha is the resident expert on simplices. The measure on the space of N-dimensional simplices \mathcal{S}^N this program imposes does not seem to be proper though. I guess it’s equivalent to taking \boldsymbol Y \sim \text{normal}_N (\boldsymbol 0_N, \sigma \boldsymbol I_N) and making \boldsymbol W = \text{softmax}(\boldsymbol Y) and letting \sigma \to \infty.

The simplex alone, without any modifications to the target density, defines a distribution that’s uniform over the points in the simplex. It’s also equivalent to a Dirichlet density with all hyperparameters set to unity, \alpha_{n} = 1.

By looking at each parameter individually, however, one is not looking at the simplex but rather its marginalizations. The marginal probability density functions are \text{Beta}(1, N - 1) which concentrate towards zero as the dimension, N, increases.

This is a manifestation of concentration of measure. Even though the density is uniform over the simplex there are more configurations near the boundary of the simplex relative to the center of the simplex as the dimension increases so that the corresponding probability distribution concentrates against that boundary. Those boundary configurations have one parameter near 1 and N - 1 parameters near 0 – since all the boundaries are equally likely on average each marginal parameter is closer to 0 than 1.

5 Likes

Thank you for that great explanation.