To test some ideas I threw around in another thread, I’m trying to write a special-purpose sampler using some of those ideas and test it against what Stan gives me. But, my problem of interest has a lot of simplexes. I looked at how Stan unconstrains simplexes but was thinking to do something different.

As I understand it, if y_i ~ gamma(1,1) then s_i = y_i/sum(y_i) is dirichlet(1_vector), which is uniform on the simplex, so if p(s) is a density on the simplex then I get what I want by p(y_i/sum(y_i))

so if x is unconstrained on -inf,inf and I constrain to [0,inf] with y = exp(x), then

p(x) dx = gamma_pdf(exp(x),1,1)*dy/dx dx

gives a gamma distribution on y = exp(x)

dropping normalization constants, this is:

exp(x)^0 * exp(-exp(x)/1) exp(x) = exp(-exp(x) + x)

so for x in [-inf,inf] I can use lp of -exp(x) + x

Does anyone see anything wrong with this calculation? Is there a reason Stan doesn’t use this reparameterization? overflow as x goes positive?

Mike

Concern about the geometry when conditioning on data

No because you could `log_sum_exp`

the denominator and then antilog.

See Ragged array of simplexes

As I understand it, the reason then that Stan doesn’t use the gamma parameterization is once you include data, the geometry in the gamma variables becomes weird, curving, and requires HMC to follow some strange surface in the unconstrained space.

That’s ok, because as it is, the geometry in my real problem is making things sample poorly using Stan’s default parameterization, and part of my idea is to stochastically unconstrain the HMC dynamics and fix things up on the back-end anyway. ;-)

As long as the calculation above isn’t incorrect, I’ll give it a try. It’s all quick and dirty to see if the basic ideas have any merit anyway.

Incidentally, my real problem uses a mixture of 7 dirichlet distributions as the prior over the simplexes, so the geometry is already plenty weird ;-)

Though it occurs to me that maybe I should try out the gamma parameterization in Stan too…

Finally, it may make some good sense to cook up a problem to test my sampling scheme on that doesn’t require such weird simplex transforms as an extra complication. Maybe just something like d dimensional independent normals with cauchy means and exponential standard deviations. If my sampler idea can’t reproduce something simple like that, no point in figuring out how to do it on the real problem.

We haven’t tried lots of alternative parameterizations. I was mainly motivated by wanting to get the right number of degrees of freedom on the unconstrained scale, get a simple Jacobian form, and set things up so that an unconstrained zero vector corresponded to a symmetric simplex.

Oh, I should add that Michael Betancourt wrote an arXiv paper on simplex parameterizations suggesting an alternative one that we’ve also never really tried in Stan.

I was playing around with some spline-like positive-constrained parameterizations but their divergence behavior was just terrible at the inflection points.

Are you referring to the cruising the simplex paper?

https://arxiv.org/abs/1010.3436

I started to implement that, but then I realized it was just another way of expressing what is already in Stan.

I’m always fairly nervous about the unbounded Jacobian (as z_i approaches either zero or one) in that transformation… @betanalpha doesn’t comment on it in the paper, but this was in his “brevity is the soul of wit” phase :p I’m sure he noticed the danger.

If you throw the z_{i} through logit functions then the problem goes away and, perhaps not uncoincidently, you recover the stick breaking result.

I thought you once told me that smooth reparameterizations of our transformed variables wouldn’t make much difference because everything sort of had to work out this way? I’ve been meaning for years to spend some time understanding this.

I can verify that radically different parameterizations, like linear down to 1e-10 and then curved after that don’t work well.

Not sure what you mean here. The only thing that I can think of is when we were talking about different (-inf, inf) -> (0, 1) maps, but that’s not a factor here. In my paper I mapped the N-simplex to a (N -1) unit hypercube, and the Jacobian had some weird behavior near the edges as @anon75146577 noted. But if you map that hypercube to R^{N-1} with some logits then the Jacobian smooths out and you recover the stick breaking transform.

I should just re-ask the question.

Should we be looking at alternative transforms for things like positive- and interval-constrained variables, simplexes, etc.?

I don’t think so. Ben was playing around with some other options a while ago and we can play around with them, but I’m not sure if they will make a significant difference.