Informative prior for simplex

Hello,

short version of the question: Say I have a parameter that is a 3d simplex. Is there an elegant way to assign an informative prior (for example, [5, 1, 1])?

Background:

I am writing a new version of my foraging model, with the aim of steamining my workflow, and making the model more flexible. In my old project, I first wrote a load of R code for generating synthetic data (which is great for model testing). I also ended up writing quite a long generated quantities block in my Stan model for generating posterior predictions.

My hope is that I can combine these and simply use the same Stan model for generating synthetic data (where I pass in parameters as highly informative priors), and for generating posterior predictions (in which case, the parameters come from a previously fit model).

So far, this is working well, and for parameters for linear regression, I can pass in β ~ N(3, 0.01) to simulate data with a slope of 3. However, I’m not sure what how to do something like this for a parameter that takes the form of a 3d simplex.

Thanks

The go-to distribution would be Dirichlet, it correctly deals with the dependence between the values in the simplex and allows modeling the correlations between the variables unlike placing univariate distributions on each value of the simplex.

There are other options, and there’ve been some questions here on the forum, so you can check that for some good alternatives.

1 Like

My (potentially incorrect) understanding is that a dirichlet prior doesn’t allow specific categories to be specified as more likely a-priori?

I have already read the post you linked, and it seems to be mainly talking about controlling how variable the prior is.

In an attempt to be clearer: suppose I have a 3d simplex as part of my model. When simulating data with the model, I would like to set this to something like [0.75, 0.20, 0.05]. I can simulate data using the generated quantities block, and then fit my model to the synthetic data to check parameter recovery etc.

Later on, I will be fitting this model to “real” data, and I will then be using the generated quantities block to calculate posterior predictions

My current aim is to use the same code (and preferably, the same .stan file) for both use cases. My hope was to feed in my simulation parameters as highly informative priors. (and when I come to actually fit my model, I can input weakly-informative priors).

This is all straight forward when dealing with linear models… for example, you can set β ~ N(b, 0.01) for some constant b when simulating data, and then set β ~ N(0, 1) when fitting the model to data.

…

I suppose an alternative solution would be to have a model that than switch from treating β as a parameter or as data (ie. a constant that the user supplies) depending on whether β is entered into the data{} block. But my gut feeling is that this wouldn’t work well with Stan?

…

The simpler solution is just to have two .stan files.. one in which the model parameters are set by the user, and one in which they are fit to the data. I’ve been hoping to avoid this, as the overall project is quite complex and having multiple files increases the number of bugs in my code etc.

In an attempt to be clearer: suppose I have a 3d simplex as part of my model. When simulating data with the model, I would like to set this to something like [0.75, 0.20, 0.05].

The Dirichlet distribution is usually written as \mbox{Dirichlet}(\theta|\alpha) with \theta = [\theta_1, \theta_2, \dots, \theta_k] and \alpha = [\alpha_1, \alpha_2, \dots, \alpha_k]. Notice that the \alpha_i can all have different values. Let’s rewrite them as \alpha_i = \pi_i A, where \sum \pi_i = 1 and A is a positive real number. Then \mbox{E}\theta_i = \pi_i. The larger A is, the more concentrated the distribution is around \pi_i.

So for your example, set \pi_1 = 0.75, \pi_2 = 0.20, \pi_3 = 0.05, and A to some big number.

Here are a couple of examples in R

> alpha <- c(0.75, 0.20, 0.05)*100
> rdirichlet(10, alpha = alpha)
           [,1]      [,2]       [,3]
 [1,] 0.6990158 0.2781734 0.02281080
 [2,] 0.7554538 0.1936752 0.05087099
 [3,] 0.7485913 0.2167799 0.03462882
 [4,] 0.7651778 0.2018314 0.03299078
 [5,] 0.7489113 0.2129159 0.03817274
 [6,] 0.7223205 0.2672653 0.01041419
 [7,] 0.7919466 0.1626215 0.04543193
 [8,] 0.7435570 0.2101369 0.04630612
 [9,] 0.7738441 0.1685606 0.05759527
[10,] 0.7806723 0.1777140 0.04161365

> alpha <- c(0.75, 0.20, 0.05)*1000
> rdirichlet(10, alpha = alpha)
           [,1]      [,2]       [,3]
 [1,] 0.7573407 0.1927956 0.04986370
 [2,] 0.7542907 0.1767394 0.06896996
 [3,] 0.7631487 0.1869011 0.04995020
 [4,] 0.7468728 0.2029784 0.05014886
 [5,] 0.7495448 0.2080578 0.04239742
 [6,] 0.7602044 0.1972247 0.04257090
 [7,] 0.7339876 0.2100086 0.05600379
 [8,] 0.7486887 0.2021225 0.04918876
 [9,] 0.7722869 0.1831204 0.04459261
[10,] 0.7515429 0.2058215 0.04263565

You can make the positive constant even bigger if you need to.

2 Likes

It does allow. You can think of the univariate case, which is a Beta distribution, as a two-dimensional simplex, gives you the probability of x and as a consequence that of 1-x. Like \alpha and \beta on the beta distribution, you specify the parameters that are compatible with one entry being more likely than the other, not the probability of each explicitly. Depending on what you need you could place a univariate distribution on each variable except for one, but it’s not the most elegant way, as you mentioned.

As @kholsinger shows, you can make the peaks of the distribution of each value very sharp, or you can make them uniform or different shapes (again, think of a beta in more dimensions). I believe it’s exactly what you need.

Neat. I guess my misunderstanding comes from only ever seeing examples of these distributions as weakly informative priors.

I think of the entries of the Dirichlet distribution as how many prior counts you have. If it’s Dirichlet(5, 1, 1), you have a sample size of 7 and 5 of them were in the first category. A Dirichlet is distribution is also recovered by normalising a vector Gamma(a_i, 1) variates where a_i is the rate of each category.

An alternative to the Dirichlet distribution is the (multivariate) logistic-normal (“LN”) distribution. The current Wiki page summarizes it pretty well: Logit-normal distribution - Wikipedia .

Due to the potential correlation structure of the generative normal distribution, the “dependences” of the components is much richer than of the Dirichlet. For example, in S^3, ternary plot contours of a Dirichlet distribution’s PDF are always convex. This is not the case for the LN. For the Dirichlet, the correlation issue is simply, “the components sum to 1.” For the LN, it is much more complex…especially as the dimensions get higher. The [current] Wiki page gives some nice contour plot examples for S^3.

One concern I can understand is that the complexity jump from the Dirichlet to the LN is so large that it would be hard to come up with a way to translate informative prior information to the abstract complexity of this distribution. This may be less of a problem in S^3, as you can graphically explore this distribution.

Good luck, if you choose to pursue this!

1 Like

Aguilar and Burkner wrote a paper about that in the context of R2 decomposition as well.

Agree with everything that has been written so far. If you need to specify prior probabilities for one parameter that then induce distributions on other parameters, then please check out Section 2.2 of my paper here. This is a slightly different context but it allows the user to set a prior probability for each group and then this induces a distribution on the cut-points for ordinal regression. I also have stan code available in the R2D2ordinal package. If this seems like what you need, please reach out to me and I can help with your specific implementation. But if you just need to work with the Dirichlet distribution directly, then what has already been written should be fine.

1 Like

Because \textrm{Dirichlet}([1 \quad 1 \quad 1]) is uniform, you want to start counting after that, so \alpha = [5 \quad 1 \quad 1] is just 4 observations of the first category.

The Dirichlet’s very weak as a prior as it just lets you specify prior mean and overall concentration—it doesn’t let you model any kind of covariance structure. Or any kinds of constraints (such as that the variance of one component is larger than a fixed value or larger than another components). As @JohnnyZoom4H pointed out, the logistic normal is much richer.

To get a handle on the logistic normal, it helps to compare the univariate form to the beta distribution. You can see the same sort of mode behavior.

In terms of generating priors, you can always base them on prior data. You can in practice just add that data in with the other data through the likelihood. For example, with a beta-bernoulli, you could either have

\qquad \theta \sim \textrm{beta}(5, 3), \qquad y \sim \textrm{bernoulli}(\theta)

or you could have

\qquad \theta \sim \textrm{beta}(1, 1), \quad [1, 1, 1, 1, 0, 0] \sim \textrm{bernoulli}(\theta), \quad y \sim \textrm{bernoulli}(\theta).

If there are fractions, e.g., \theta \sim \textrm{beta}(5.1, 2.9), then you can do that in Stan as

target += 4.1 * bernoulli_lpdf(1 | theta);
target += 1.9 * bernoulli_lpdf(0 | theta);
1 Like