Reparameterization example in manual


After I got some complaints about divergent transitions and increasing the adapt_delta parameter did not work, I decided to try my luck at reparameterization. Currently on page 287 and 288 of the manual there is an example on how to do that for a Dirichlet prior. This is the suggested reparameterization:

parameters {
simplex[K] phi;
real<lower=0> kappa;
simplex[K] theta[N];
transformed parameters {
vector[K] alpha = kappa * theta;
model {
phi ~ ...;
kappa ~ ...;
for (n in 1:N)
theta[n] ~ dirichlet(alpha);  

My questions are:

  • Why is phi defined if it is not used?
  • Why is alpha both a function of theta (alpha = kappa*theta) but also a hyperparameter of theta (theta ~dirichlet(alpha))?
  • can we just write theta~dirichlet(alpha) instead of looping over 1:N?

Kind regards,

Pretty sure it is supposed to be vector[K] alpha = kappa * phi; and no you cannot avoid the loop.

1 Like

Not yet, but we’d love to get some programming help to vectorize the remaining unvectorized distributions.

Yes, and I’ll fix the manual:

A vectorised dirichlet would actually be quite useful for me, I’ll take that on as a project

It’d also be nice to have a mean/total parameterization:

DirichletMean(theta | phi, kappa) 
Dirichlet(theta | phi * kappa)

where theta is a K-simplex, kappa > 0 and phi is a K-simplex. It’s much easier to think of priors this way. And also the beta version.

The other simplex distribution that would be nice would be multi-logit,

MultiLogitNormal(theta | mu, Sigma)
MultiNormal(logit(theta) | mu, Sigma)

Like lognormal or inverse gamma, this one needs the Jacobian adjustment, but it’s simple because the transform is dimensionwise and hence diagonal.

@andrjohns any progress on the vectorized dirichlet?

Thanks for reminding me! I’ve added an issue in the math library here and will code that up

@andrjohns awesome! I’m using a method where we have a bunch of latent dirichlet random variables, so this will be super useful to me. I’ll be sure to let you know how it works out whenever you get this done!


I have implemented the lpdf (and rng) for the multi_logit_normal. Based on your post here I was thinking of contributing it to Stan. Is there any information anywhere on what is expected of code for new distributions like this or how to open a PR for a new distribution?


Hi @mans_magnusson,

there’s some documentation on adding a new distribution at Stan Math Library: Adding A New Distribution.

I highly suggest that it be the Cholesky parameterization because you can reuse most the current mvn code and it’s more efficient and has derivatives. As @Bob_Carpenter says above, you can keep all the multi_normal_cholesky stuff and just add the jacobian adjustment for the logit transform. Literally it’s a one line adjustment. The harder part is adding chain-rule derivative for the partial wrt the input but it’s just the partial of the input that changes, so also not too difficult.

Great! Yes. I was actually thinking of adding both Chol and with Sigma? Yes. In both cases, we can use the multivariate normal and just add the jacobian.

A quick additional question then. Is there a standard for reference categories? I now use the last category, but think it is better to synchronize this if there is a Stan standard.


Sorry, I don’t understand, what do you mean by “reference categories”?


So the multi-logit-normal parameters mu and Sigma are defined on R^(D-1) and R^(D-1)*(D-1). The reference category is usually set to 0 so that the distribution is identified. I think this is usually referred to as the reference category. The question is what to use as the reference category? I now use category D.




Got it, first open a Stan math issue and ask. I don’t have any strong preference but someone might. I think the main thing is that we want this clearly documented for the user.

Great! Thanks! Will do!

Note that the brms implementation here defaults to using the first category as reference. Implement the logistic-normal family by paul-buerkner · Pull Request #1295 · paul-buerkner/brms · GitHub

I’ll just mention that SUG 9.5 deals with something like this, but appears to omit the identifying constraint. However, SUG 9.5 cites Blei & Lafferty 2007, which in turn cites Aitchison & Shen 1980, who impose the identifying constraint by applying what they call a logistic transform to an n-1 dimensional normal variate, then deriving the final element by subtracting from 1. This is equivalent to using the last category as the reference category.

Edit: see also SUG 9.5 suggests a degenerate model · Issue #456 · stan-dev/docs · GitHub