After I got some complaints about divergent transitions and increasing the adapt_delta parameter did not work, I decided to try my luck at reparameterization. Currently on page 287 and 288 of the manual there is an example on how to do that for a Dirichlet prior. This is the suggested reparameterization:

@andrjohns awesome! Iâ€™m using a method where we have a bunch of latent dirichlet random variables, so this will be super useful to me. Iâ€™ll be sure to let you know how it works out whenever you get this done!

I have implemented the lpdf (and rng) for the multi_logit_normal. Based on your post here I was thinking of contributing it to Stan. Is there any information anywhere on what is expected of code for new distributions like this or how to open a PR for a new distribution?

I highly suggest that it be the Cholesky parameterization because you can reuse most the current mvn code and itâ€™s more efficient and has derivatives. As @Bob_Carpenter says above, you can keep all the multi_normal_cholesky stuff and just add the jacobian adjustment for the logit transform. Literally itâ€™s a one line adjustment. The harder part is adding chain-rule derivative for the partial wrt the input but itâ€™s just the partial of the input that changes, so also not too difficult.

Great! Yes. I was actually thinking of adding both Chol and with Sigma? Yes. In both cases, we can use the multivariate normal and just add the jacobian.

A quick additional question then. Is there a standard for reference categories? I now use the last category, but think it is better to synchronize this if there is a Stan standard.

So the multi-logit-normal parameters mu and Sigma are defined on R^(D-1) and R^(D-1)*(D-1). The reference category is usually set to 0 so that the distribution is identified. I think this is usually referred to as the reference category. The question is what to use as the reference category? I now use category D.

Got it, first open a Stan math issue and ask. I donâ€™t have any strong preference but someone might. I think the main thing is that we want this clearly documented for the user.

Iâ€™ll just mention that SUG 9.5 deals with something like this, but appears to omit the identifying constraint. However, SUG 9.5 cites Blei & Lafferty 2007, which in turn cites Aitchison & Shen 1980, who impose the identifying constraint by applying what they call a logistic transform to an n-1 dimensional normal variate, then deriving the final element by subtracting from 1. This is equivalent to using the last category as the reference category.