Categorical model with Dirichlet priors on parameters?

  • Operating System: Ubuntu 18.04 LTS
  • brms Version: 2.10

Dear all,

I am a new user of brms and Bayesian statistics, so I apologize in advance if my question is ill-formulated or fundamentally wrong. I am attempting to fit a categorical model (mixed effects), where response and predictors are all categorical, and unordered. A reviewer suggested I use a Dirichlet prior on my parameters (for reason “the Dirichlet is the conjugate prior of the categorical distribution”), instead of the Student-t priors I originally used.

The problem:
I cannot seem to set a Dirichlet prior on my parameters. I am not sure whether this (1) possible in brms (or in stan), or (2) a sensible thing to do. I am willing to hear it isn’t either. It doesn’t immediately make sense to me, as it seems parameters are estimated on a logit scale, but the Dirichlet is on (0,1)–perhaps a transformation is needed? I would be grateful for any advice.

What I’ve tried:
Here is a reproducible example like my original code, with simulated data (further description if needed is provided at the end). I originally fit the model using Student-t priors because I had some prior info from a previous experiment of what reasonable parameters might be.

# Simulated random data set
testdata = expand.grid(resistance = LETTERS[1:5],
                 antibiotic = 1:3,
                 strain = letters[1:4],
                 plate = paste("plate",1:3))
testdata = rbind(testdata,testdata[sample(1:nrow(testdata), size = 120, replace = T),])

priors_original = set_prior ("student_t(7, -5, 2.5)", class = "Intercept") +
                  set_prior ("student_t(7, 0, 2.5)", class = "b")
model1 = resistance ~ antibiotic + strain + (1|plate)
brm(model1, family = "categorical", prior = priors_original, data = testdata)

To use a Dirichlet prior, I attempted this (note the stanvar declaration is necessary as specifying a list of values in dirichlet() gets treated like a list of reals rather than a vector):

stanvars = stanvar(name = "myalpha", block = "parameters", scode = "  vector[4] myalpha;")
priors1 =  set_prior ("dirichlet(myalpha)", class = "b", dpar = "muB") +
           set_prior ("dirichlet(myalpha)", class = "b", dpar = "muC") +
           set_prior ("dirichlet(myalpha)", class = "b", dpar = "muD") +
           set_prior ("dirichlet(myalpha)", class = "b", dpar = "muE")
# This doesn't work:
priors1a = set_prior ("dirichlet(1,1,1,1)", class = "b", dpar = "muB") +
           set_prior ("dirichlet(1,1,1,1)", class = "b", dpar = "muC") +
           set_prior ("dirichlet(1,1,1,1)", class = "b", dpar = "muD") +
           set_prior ("dirichlet(1,1,1,1)", class = "b", dpar = "muE")

brm(model1, family = "categorical", prior = priors1, data = testdata, stanvar = stanvars)

The model compiles, but fails at initialisation with either of the errors below.

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: dirichlet_lpmf: prior sample sizes[2] is -1.54878, but must be > 0! (in ‘model6e191c0c9736_c9e18480b7fa263b94f1fd1314065dfd’ at line 140)

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: dirichlet_lpmf: probabilities is not a valid simplex. sum(probabilities) = 1.480112724, but should be 1 (in ‘model6e191c0c9736_c9e18480b7fa263b94f1fd1314065dfd’ at line 140)

Ultimately failing with:

Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

Note if I set one of my parameters to monotonic mo(), the code below ‘works’, but I do not know if it is sensible.

model2 = resistance ~ mo(antibiotic) + strain + (1|plate)
priors2 = set_prior("dirichlet(1,1)", class = "simo", coef = "moantibiotic1", dpar = "muB") +
	  set_prior("dirichlet(1,1)", class = "simo", coef = "moantibiotic1", dpar = "muC") +
	  set_prior("dirichlet(1,1)", class = "simo", coef = "moantibiotic1", dpar = "muD") +
	  set_prior("dirichlet(1,1)", class = "simo", coef = "moantibiotic1", dpar = "muE")
brm(model2, family = "categorical", prior = priors2, data = testdata, stanvar = stanvars)

Information about the original model:
I have a categorical model mixed-effects, where the response is not monotonic, and can have 1 of 5 states (type of antibiotic resistance observed). The predictors are also categorical (which antibiotic was applied, bacteria type). There is also a ‘random’ effect (plate). I have prior knowledge about the parameters for antibiotic and strain (from a previous experiment), hence I set an informative prior. I chose a student-t prior because of its heavier tails wrt. the normal distribution.

3 Likes

I’m not sure what they’re asking you to do make a whole lot of sense. I could be wrong, but here’s my take.

In Stan a categorical outcome with simplex parameter would look like:

parameters {
  simplex[4] theta;
}

model {
  theta ~ dirichlet([1, 1, 1, 1]');
  y ~ categorical(theta);
}

And like a dirichlet is a conjugate prior for the categorical distribution so we know the posterior is still a dirichlet and we don’t need Stan for this.

In the model you’re fitting, theta is not a parameter. You say theta is a function of other things. Your model is something like:

parameters {
  vector[N] beta;
  real intercept;
}

model {
  simplex[4] theta = softmax(beta * X + intercept);
  y ~ categorical(theta)

You put your briors on beta and the intercept (X is data).

I think it basically comes down to you’re doing the second thing, where you explain theta as a regression of other stuff. The first thing you’re just estimating theta, but there’s no explaining theta itself as a function of other things.

Now you might try to pick a prior on beta/intercept such that theta is dirichlet distributed without seeing data, but the easy way to do that is the change of variables described here: https://mc-stan.org/docs/2_21/reference-manual/change-of-variables-section.html, and that probably won’t work cause I doubt there’s an invertible function between theta and beta + intercept.

Anyway they might be suggesting you do this cause it seems like it would be easy, but I don’t think that is an easy thing to do.

I don’t think there’s any mechanism to do this in brms. That make sense? Or am I off base?

4 Likes