Parameterizing simplexes

dlakelan · August 19, 2017, 7:44pm

To test some ideas I threw around in another thread, I’m trying to write a special-purpose sampler using some of those ideas and test it against what Stan gives me. But, my problem of interest has a lot of simplexes. I looked at how Stan unconstrains simplexes but was thinking to do something different.

As I understand it, if y_i ~ gamma(1,1) then s_i = y_i/sum(y_i) is dirichlet(1_vector), which is uniform on the simplex, so if p(s) is a density on the simplex then I get what I want by p(y_i/sum(y_i))

so if x is unconstrained on -inf,inf and I constrain to [0,inf] with y = exp(x), then
p(x) dx = gamma_pdf(exp(x),1,1)*dy/dx dx

gives a gamma distribution on y = exp(x)

dropping normalization constants, this is:

exp(x)^0 * exp(-exp(x)/1) exp(x) = exp(-exp(x) + x)

so for x in [-inf,inf] I can use lp of -exp(x) + x

Does anyone see anything wrong with this calculation? Is there a reason Stan doesn’t use this reparameterization? overflow as x goes positive?

bgoodri · August 19, 2017, 8:15pm

Mike

Concern about the geometry when conditioning on data

No because you could log_sum_exp the denominator and then antilog.

See Ragged array of simplexes

dlakelan · August 19, 2017, 9:39pm

As I understand it, the reason then that Stan doesn’t use the gamma parameterization is once you include data, the geometry in the gamma variables becomes weird, curving, and requires HMC to follow some strange surface in the unconstrained space.

That’s ok, because as it is, the geometry in my real problem is making things sample poorly using Stan’s default parameterization, and part of my idea is to stochastically unconstrain the HMC dynamics and fix things up on the back-end anyway. ;-)

As long as the calculation above isn’t incorrect, I’ll give it a try. It’s all quick and dirty to see if the basic ideas have any merit anyway.

dlakelan · August 19, 2017, 9:40pm

Incidentally, my real problem uses a mixture of 7 dirichlet distributions as the prior over the simplexes, so the geometry is already plenty weird ;-)

dlakelan · August 19, 2017, 9:56pm

Though it occurs to me that maybe I should try out the gamma parameterization in Stan too…

Finally, it may make some good sense to cook up a problem to test my sampling scheme on that doesn’t require such weird simplex transforms as an extra complication. Maybe just something like d dimensional independent normals with cauchy means and exponential standard deviations. If my sampler idea can’t reproduce something simple like that, no point in figuring out how to do it on the real problem.

Bob_Carpenter · August 21, 2017, 10:54am

We haven’t tried lots of alternative parameterizations. I was mainly motivated by wanting to get the right number of degrees of freedom on the unconstrained scale, get a simple Jacobian form, and set things up so that an unconstrained zero vector corresponded to a symmetric simplex.

Bob_Carpenter · August 21, 2017, 10:55am

Oh, I should add that Michael Betancourt wrote an arXiv paper on simplex parameterizations suggesting an alternative one that we’ve also never really tried in Stan.

I was playing around with some spline-like positive-constrained parameterizations but their divergence behavior was just terrible at the inflection points.

aaronjg · August 23, 2017, 12:28am

Are you referring to the cruising the simplex paper?

https://arxiv.org/abs/1010.3436

I started to implement that, but then I realized it was just another way of expressing what is already in Stan.

anon75146577 · August 23, 2017, 12:52am

I’m always fairly nervous about the unbounded Jacobian (as z_i approaches either zero or one) in that transformation… @betanalpha doesn’t comment on it in the paper, but this was in his “brevity is the soul of wit” phase :p I’m sure he noticed the danger.

betanalpha · August 23, 2017, 1:49am

If you throw the z_{i} through logit functions then the problem goes away and, perhaps not uncoincidently, you recover the stick breaking result.

anon75146577 · August 23, 2017, 1:50am

Cool!

Bob_Carpenter · August 23, 2017, 11:05pm

I thought you once told me that smooth reparameterizations of our transformed variables wouldn’t make much difference because everything sort of had to work out this way? I’ve been meaning for years to spend some time understanding this.

I can verify that radically different parameterizations, like linear down to 1e-10 and then curved after that don’t work well.

betanalpha · August 24, 2017, 3:19am

Not sure what you mean here. The only thing that I can think of is when we were talking about different (-inf, inf) -> (0, 1) maps, but that’s not a factor here. In my paper I mapped the N-simplex to a (N -1) unit hypercube, and the Jacobian had some weird behavior near the edges as @anon75146577 noted. But if you map that hypercube to R^{N-1} with some logits then the Jacobian smooths out and you recover the stick breaking transform.

Bob_Carpenter · August 24, 2017, 9:04am

I should just re-ask the question.

Should we be looking at alternative transforms for things like positive- and interval-constrained variables, simplexes, etc.?

betanalpha · August 25, 2017, 1:32am

I don’t think so. Ben was playing around with some other options a while ago and we can play around with them, but I’m not sure if they will make a significant difference.

Topic		Replies	Views
Some questions on the simplex transform Modeling performance	10	2057	July 21, 2017
Divergent transitions when sampling from Dirichlet distribution Modeling	3	1717	August 14, 2017
Two parameter distribution over the simplex Modeling	23	2880	April 13, 2018
Cruising the simplex Algorithms	8	1195	December 5, 2018
Reparameterize in a hierarchical model Modeling cognitive-science	12	3698	September 15, 2017

Parameterizing simplexes

Related topics