Prior for Simplex, more informative than Dirichlet

irelamb · February 20, 2024, 9:54am

Hello,
I saw there few posts about the Dirichlet distribution but I didn’t find a clear answer to what is probably a very basic question. In my model I have a 4-element vector representing 4 probabilities, that sum to 1.

parameters {
  simplex[4] p; // transition probabilities
}

I would like to specify a probability for this parameters, and the most natural choice would be the Dirichlet with parameters theta, which in my case would depend on other parameters of the model, so

transformed parameters {
  vector[4] theta;
}

So that finally I would define in the model

model {
  p ~ dirichlet(theta);
}

However such a prior is not informative enough for my model, and I would like something stricter. What would be a correct way of implementing it? I thought about defining a multinormal distribution like

model {
  p ~ multi_normal(theta, 0.2 * theta);
}

Would this be a correct way of specifying the prior? Should theta be specified as simplex as well?

Thank you in advance for your help!

scholz · February 20, 2024, 10:05am

You could take a look at the logistic-normal distribution (which is the multidimensional generalization of the logit-normal, don’t ask me why the names…)

jsocolar · February 20, 2024, 12:38pm

If it induces a prior on p that is consistent with the information you intend to inject, then it’s a fine way to specify the prior. Something to be aware of is that this will not lead to a multivariate normal prior on p, but rather to a truncated mvn prior, truncated by the simplex constraint. The interaction of these priors and constraints can sometimes do surprising things. For example the marginals might no longer resemble univariate normals, and you’ll still get negative covariances between elements of p even if you specify null or strictly positive covariances in the mvn. The nice thing about using dirichlet priors on simplexes is that because the prior inherently reflects the constraint, it’s easier to recognize what the realized prior on the simplex is (it’s just the dirichlet, not some funny truncated dirichlet).

irelamb · February 21, 2024, 3:29pm

Thank you for the useful answer!!!

Bob_Carpenter · February 21, 2024, 7:03pm

It’s the same naming convention as log normal. Take a normal distribution and transform the output. In this case, it’s a multivariate normal distribution put through a logistic transform. Specifically, if

y \sim \textrm{normal}(\mu, \Sigma),

then

\textrm{logit}^{-1}(y) \sim ~ \textrm{logisticNormal}(\mu, \Sigma).

By analogy, if y \sim \textrm{normal}(\mu, \sigma) in one dimension, then \exp(y) \sim \textrm{lognormal}(\mu, \sigma). In stats, we name things after the inverse transform, so the log transform takes a log normal variate and turns it into a normal variate.

scholz · February 21, 2024, 7:10pm

I was talking about the fact that the one-dimensional version seems to be called logit-normal, while the multidimensional version is the logistic-normal.

Bob_Carpenter · February 21, 2024, 7:15pm

What do you mean by “stricter”? You can crank down the variance in a Dirichlet by cranking up the concentration. @scholz’s suggestion of a multivariate logistic normal lets you also model correlations.

How is theta defined? Why make the prior scale proportional to the value? It means a value of 0.1 gets a prior scale of 0.02, whereas a value of 0.5 gets a prior scale of 0.1 and a value of 0.9 gets a prior scale of 0.18. @jsocolar makes the good point that the prior isn’t multivariate normal here, it’s truncated. The truncation isn’t a problem—Stan will handle that appropriately implicitly. Where truncation becomes a problem is when the data is inconsistent with the truncation and probability mass piles up on the boundary. I don’t think that would happen here, but it’s definitely something to watch out for.

The multivariate logistic has the same benefits as the Dirichlet in that it’s already normalized to the simplex.

irelamb · February 22, 2024, 9:26am

Hi Bob, thank you very much for your reply.

What do you mean by “stricter”? You can crank down the variance in a Dirichlet by cranking up the concentration.

I guess you meant by cranking down the concentration parameters \alpha_i. I think I would try that.

Do I understand well that there is no predefined logistic-normal in Stan, but the way to implement it is by using the normal and transforming the output?

irelamb · February 24, 2024, 2:17pm

Sorry, I am not so familiar with the Dirichlet distribution, and increasing the concentration parameters to decrease the variance was correct.

andre.pfeuffer · February 26, 2024, 2:52am

A hierarchy of beta-distributions. A variant is the generalized dirichlet to start. But you may be
better off defining your own fitting your problem.

Topic		Replies	Views
How to properly use Simplex with Dirichlet likelihood? Modeling	9	1206	November 2, 2020
Hierarchical multinomial model with sparse data Modeling	3	448	October 20, 2022
Vague Proper Dirichlet Prior Modeling	9	4683	December 4, 2018
Generalized Dirichlet Distribution as a prior Modeling	16	2650	January 7, 2022
Best way to pass a simplex to another model Modeling	1	338	September 29, 2019

Prior for Simplex, more informative than Dirichlet

Related topics