Prior for Simplex, more informative than Dirichlet

I saw there few posts about the Dirichlet distribution but I didn’t find a clear answer to what is probably a very basic question. In my model I have a 4-element vector representing 4 probabilities, that sum to 1.

parameters {
  simplex[4] p; // transition probabilities

I would like to specify a probability for this parameters, and the most natural choice would be the Dirichlet with parameters theta, which in my case would depend on other parameters of the model, so

transformed parameters {
  vector[4] theta;

So that finally I would define in the model

model {
  p ~ dirichlet(theta);

However such a prior is not informative enough for my model, and I would like something stricter. What would be a correct way of implementing it? I thought about defining a multinormal distribution like

model {
  p ~ multi_normal(theta, 0.2 * theta);

Would this be a correct way of specifying the prior? Should theta be specified as simplex as well?

Thank you in advance for your help!


You could take a look at the logistic-normal distribution (which is the multidimensional generalization of the logit-normal, don’t ask me why the names…)


If it induces a prior on p that is consistent with the information you intend to inject, then it’s a fine way to specify the prior. Something to be aware of is that this will not lead to a multivariate normal prior on p, but rather to a truncated mvn prior, truncated by the simplex constraint. The interaction of these priors and constraints can sometimes do surprising things. For example the marginals might no longer resemble univariate normals, and you’ll still get negative covariances between elements of p even if you specify null or strictly positive covariances in the mvn. The nice thing about using dirichlet priors on simplexes is that because the prior inherently reflects the constraint, it’s easier to recognize what the realized prior on the simplex is (it’s just the dirichlet, not some funny truncated dirichlet).

1 Like

Thank you for the useful answer!!!

It’s the same naming convention as log normal. Take a normal distribution and transform the output. In this case, it’s a multivariate normal distribution put through a logistic transform. Specifically, if

y \sim \textrm{normal}(\mu, \Sigma),


\textrm{logit}^{-1}(y) \sim ~ \textrm{logisticNormal}(\mu, \Sigma).

By analogy, if y \sim \textrm{normal}(\mu, \sigma) in one dimension, then \exp(y) \sim \textrm{lognormal}(\mu, \sigma). In stats, we name things after the inverse transform, so the log transform takes a log normal variate and turns it into a normal variate.

I was talking about the fact that the one-dimensional version seems to be called logit-normal, while the multidimensional version is the logistic-normal.

What do you mean by “stricter”? You can crank down the variance in a Dirichlet by cranking up the concentration. @scholz’s suggestion of a multivariate logistic normal lets you also model correlations.

How is theta defined? Why make the prior scale proportional to the value? It means a value of 0.1 gets a prior scale of 0.02, whereas a value of 0.5 gets a prior scale of 0.1 and a value of 0.9 gets a prior scale of 0.18. @jsocolar makes the good point that the prior isn’t multivariate normal here, it’s truncated. The truncation isn’t a problem—Stan will handle that appropriately implicitly. Where truncation becomes a problem is when the data is inconsistent with the truncation and probability mass piles up on the boundary. I don’t think that would happen here, but it’s definitely something to watch out for.

The multivariate logistic has the same benefits as the Dirichlet in that it’s already normalized to the simplex.

1 Like

Hi Bob, thank you very much for your reply.

What do you mean by “stricter”? You can crank down the variance in a Dirichlet by cranking up the concentration.

I guess you meant by cranking down the concentration parameters \alpha_i. I think I would try that.

Do I understand well that there is no predefined logistic-normal in Stan, but the way to implement it is by using the normal and transforming the output?

Sorry, I am not so familiar with the Dirichlet distribution, and increasing the concentration parameters to decrease the variance was correct.

1 Like

A hierarchy of beta-distributions. A variant is the generalized dirichlet to start. But you may be
better off defining your own fitting your problem.