After I got some complaints about divergent transitions and increasing the adapt_delta parameter did not work, I decided to try my luck at reparameterization. Currently on page 287 and 288 of the manual there is an example on how to do that for a Dirichlet prior. This is the suggested reparameterization:
@andrjohns awesome! I’m using a method where we have a bunch of latent dirichlet random variables, so this will be super useful to me. I’ll be sure to let you know how it works out whenever you get this done!
I have implemented the lpdf (and rng) for the multi_logit_normal. Based on your post here I was thinking of contributing it to Stan. Is there any information anywhere on what is expected of code for new distributions like this or how to open a PR for a new distribution?
I highly suggest that it be the Cholesky parameterization because you can reuse most the current mvn code and it’s more efficient and has derivatives. As @Bob_Carpenter says above, you can keep all the multi_normal_cholesky stuff and just add the jacobian adjustment for the logit transform. Literally it’s a one line adjustment. The harder part is adding chain-rule derivative for the partial wrt the input but it’s just the partial of the input that changes, so also not too difficult.
Great! Yes. I was actually thinking of adding both Chol and with Sigma? Yes. In both cases, we can use the multivariate normal and just add the jacobian.
A quick additional question then. Is there a standard for reference categories? I now use the last category, but think it is better to synchronize this if there is a Stan standard.
So the multi-logit-normal parameters mu and Sigma are defined on R^(D-1) and R^(D-1)*(D-1). The reference category is usually set to 0 so that the distribution is identified. I think this is usually referred to as the reference category. The question is what to use as the reference category? I now use category D.
Got it, first open a Stan math issue and ask. I don’t have any strong preference but someone might. I think the main thing is that we want this clearly documented for the user.
I’ll just mention that SUG 9.5 deals with something like this, but appears to omit the identifying constraint. However, SUG 9.5 cites Blei & Lafferty 2007, which in turn cites Aitchison & Shen 1980, who impose the identifying constraint by applying what they call a logistic transform to an n-1 dimensional normal variate, then deriving the final element by subtracting from 1. This is equivalent to using the last category as the reference category.