Dirichlet Priors

CerulloE · January 31, 2021, 3:56pm

In section 20.2 of the manual, some examples for priors for the beta distribution are given (from

parameters {
  real<lower=0,upper=1> phi;
  real<lower=0.1> lambda;
  ...
transformed parameters {
  real<lower=0> alpha = lambda * phi;
  real<lower=0> beta = lambda * (1 - phi);
  ...
model {
  phi ~ beta(1, 1); // uniform on phi, could drop
  lambda ~ pareto(0.1, 1.5);
  for (n in 1:N)
    theta[n] ~ beta(alpha, beta);
  ...

However, for the Dirichlet only the form is specificed but not example prior distributions,


parameters {
  simplex[K] phi;
  real<lower=0> kappa;
  simplex[K] theta[N];
  ...
transformed parameters {
  vector[K] alpha = kappa * phi;
  ...
}
model {
  phi ~ ...;
  kappa ~ ...;
  for (n in 1:N)
    theta[n] ~ dirichlet(alpha);

Obviously this will depend on the problem, but in general would be some sensible choices for phi and kappa ?

torkar · February 10, 2021, 8:25am

Sorry for taking so long to reply. I’m a bit uncertain myself of suitable priors, but my hope is that @martinmodrak could help, or find the expertise to do it :)

CerulloE · February 10, 2021, 7:45pm

Thanks, I ended up doing:

parameters {
  simplex[K] phi;
  real kappa;
  simplex[K] theta[N];
.....
}
transformed parameters {
  vector[K] alpha = exp(kappa) * phi;
....
}
model {
  phi ~ beta(1, 1); // redundant
  kappa ~ normal(0, 5) ;// allows a lot of variation in the dirichlet hyperparameters 
  for (n in 1:N)
    theta[n] ~ dirichlet(alpha);
.....
}

torkar · February 11, 2021, 6:23am

I guess the prior predictive checks looked sane?

martinmodrak · February 11, 2021, 3:41pm

So the combination of

vector[K] alpha = exp(kappa) * phi;

and

kappa ~ normal(0, 5) ;

is indeed extremely wide - but hey, as long as you have enough data to constrain it, it should not be a problem. To be safe, I would also consider running the model with a narrower (say normal(0,1)) prior on kappa and see if your inferences change. If not, you are probably good.

Here is a quick code I wrote to visualise bivariate marginal distributions from the prior to let me better understand it (you could work this out analytically, but I write code faster than I do math, so I chose simulation :-)

N <- 1e5
K <- 10

res <- array(NA_real_, dim = c(K, N))
kappa <- rnorm(N, 0, 5)
phi <- MCMCpack::rdirichlet(N, rep(1, K)) #Uniform simplex
for(i in 1:N) {
  res[,i] <- MCMCpack::rdirichlet(1, exp(kappa[i]) * phi[i,])
}

print(paste0(sum(is.na(res)), " NAs"))

df_to_plot <- data.frame(x1 = res[1,], x2 = res[2,]) %>% filter(!is.na(x1), !is.na(x2))       
ggplot(df_to_plot, aes(x1, x2)) + geom_hex() + scale_fill_continuous(trans = "log10")

So for K = 10 the bivariate implied prior is:

which is pretty concentrated at “both almost zero” and “single almost 1, other almost zero” (not the log scale on fill)

Best of luck with your model.

CerulloE · February 11, 2021, 5:17pm

Thanks!

This model is part of a more complicated model, which involves a multivariate ordered probit, and i’m using a hierarchical induced-dirichlet model on the latent cutpoints. For prior predictive checks for this part of the model, I have been plotting the posteriors for the cumulative probabilities. I have incorporated domain expertise into other aspects on the model, and I don’t want to for this part of the model which uses ordered regression (as there isnt much available). In this case would I want the priors for the cumulative probabilities to be approximately uniform? Or, should I not look at the cumulative probabilities for prior predictive checks? I guess just because these are uniform it doesn’t mean it is less informative, since the priors might be “forcing” them to be more uniform…

I have 17 categories (K=17) and have plotted:

It looks like the N(0,5) would more easily allow 2 out of 17 of the categories to have most of the weight compared to the N(0, 1)?

CerulloE · February 12, 2021, 10:02am

I actually get better mixing and less divergent transitions (none) with normal(0, 5) vs normal(0,1) on kappa. I guess N(0, 1) is conflicting with the likelihood too much as the posterior median for kappa is 4.81 with this prior.

Posterior draw summaries:

Normal(0,5)

Normal(0, 1):

martinmodrak · February 12, 2021, 10:34am

Experiment beats my speculation :-). Yes this looks like indeed normal(0,1) would be a bad choice and normal(0,5) is better.

Best of luck with your model!

Topic		Replies	Views
Reparameterization example in manual Modeling	15	1313	March 21, 2022
Prior for Simplex, more informative than Dirichlet Modeling	9	520	February 26, 2024
Prior for kappa in beta-binomial Modeling	4	66	January 14, 2025
Hierarchical model with probability parameter Modeling	4	1050	February 4, 2018
Feedback requested on beta regression phi prior - includes visualisation brms	13	1639	August 28, 2020

Dirichlet Priors

Related topics