Transforming a multinomial model into a dirichlet-multinomial

Hi everyone,

So I have this ODE model that is fitted to categorical data using a multinomial likelihood. Basically it’s about antimocrobial resistance, and the categories correspond to increasing levels of resistance. This model works well, the sampling is quick, there are no divergences and the posterior predictive checks are good on average. So I don’t think there is anything wrong with the model itself. However, the model wildly underestimates the uncertainty, or equivalently the data is overdispersed. So to take a Poisson analogy I read from @stemangiola, I need for my multinomial model what the negative binomial is to the Poisson.

So following a few threads in this forum I tried a dirichlet-multinomial version of the same model. It all started well, the sampling was relatively quick and it all finished without warning. The posterior predictive checks were exactly what I wanted: very good fit to data, the right amount of uncertainty. However, and here comes my problem, I had more divergences that I have ever seen in a model (and believe me I have fitted some shady models). >95% divergences in all runs, regardless of adapt_delta, without any way to know where the problem is exactly (since it’s apparently everywhere). I also tried changing the priors or fixing the parameters one by one without any improvement. The prior predictive checks also look good, without divergence. So my conclusion is that the underlying model is fine but that I did not implement the Dirichlet-Multinomial properly, althought I mostly mirrored other implementations (eg [https://www.biorxiv.org/content/10.1101/711317v3](https://this one)). I also tried reimplementing a few times without success, that’s why I finally came here.

The models is quite big and I’m not allowed to share the data, so I thought I’ll just focus on the important parts. I understand that this makes it more difficult to help me but maybe it’s enough to get some advice or fix an obvious mistake? If not I will come back with a simulated dataset.

Here are the important parts of the initial model:

parameters {
  real<lower=0> theta[P]; // parameters for the ODE
} 
transformed parameters {
  real y[S,C]; // C is the number of ODE compartments (one more than K, the number of categories), S is the number of evaluation times
  simplex[K] output[S];
  y = integrate_ode_rk45(ode, init, t0, ts, theta, x_r, x_i, 1.0E-6, 1.0E-6, 1.0E3);
  for(i in 1:S) for(j in 1:C) y[i,j] = y[i,j] + 2*1.0E-6; // avoid negative values due to solver tolerance
  for(i in 1:S) output[i] = ( to_vector(y[g,i,2:C]) ) ./ sum( to_vector(y[g,i,2:C]) ); // get the proportion in each category
}
model {
  ... // priors for theta
  for(i in 1:S) cat_data[i] ~ multinomial(output[i] );
}

and here is my Dirichlet-Multinomial version:

parameters {
  real<lower=0> theta[P]; // parameters for the ODE
  real<lower=0> phi;
  simplex[K] trans_output[G,L];
} 
transformed parameters {
  real y[S,C]; // C is the number of ODE compartments (one more than K, the number of categories), S is the number of evaluation times
  simplex[K] output[S];
  y = integrate_ode_rk45(ode, init, t0, ts, theta, x_r, x_i, 1.0E-6, 1.0E-6, 1.0E3);
  for(i in 1:S) for(j in 1:C) y[i,j] = y[i,j] + 2*1.0E-6; // avoid negative values due to solver tolerance
  for(i in 1:S) output[i] = ( to_vector(y[g,i,2:C]) ) ./ sum( to_vector(y[g,i,2:C]) ); // get the proportion in each category
}
model {
  ... // priors for theta
  phi ~ exponential(0.001); // from https://www.biorxiv.org/content/10.1101/711317v3
  for(i in 1:S) trans_output[i] ~ dirichlet(phi*output[i]);
  for(i in 1:S) cat_data[i] ~ multinomial(trans_output[i] );
}

Does someone have any advice? Thanks!

Err, shouldn’t that be multinomial(trans_output[i])? And actually, how do the shapes make sense here, why does first loop have 1:L when output is of length S and the outermost dimension of trans_output is G?

Dirichlet-multinomial has a closed-form PMF. Maybe you could use that.

functions {
  real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
    real sum_alpha = sum(alpha);
    return lgamma(sum_alpha) - lgamma(sum(y) + sum_alpha)
           // + lgamma(sum(y)+1) - sum(lgamma(to_vector(y)+1) // constant, may omit
           + sum(lgamma(to_vector(y) + alpha)) - sum(lgamma(alpha));
  }
}
parameters {
  real<lower=0> theta[P]; // parameters for the ODE
  real<lower=0> phi;
}
model {
  ...
  for(i in 1:S) cat_data[i] ~ dirichlet_multinomial(phi*output[i]);
}
3 Likes

Oh right, I made a few typos while adapting the original model into this simplified version. It’s indeed multinomial(trans_output[i]), and 1:S!

Thanks for pointing out the closed-form PMF, I’ll try it out and report back!

Many thanks!

It did the trick! Thank you so much @nhuurre, I was scratching my head for quite some time!