Non-centred parameterisation for Dirichlet distribution


#1

I am trying to fit a model of the form:

for(i in 1:N)
   X[i] ~ multinomial(theta);
theta ~ dirichlet(exp(alpha));
alpha ~ normal(0,1);

where alpha is a univariate parameter. Unsurprisingly, this model – being non-centred – suffers from divergent iterations. These do not go away if I increase adapt_delta nor if I decrease the step size. I think I need a non-centred parameterisation here.

I’ve had a look at can’t seem to find one that people have used in the past for Dirichlet distributions. Does anyone have an idea how to ‘non-centre’ the above?

Best,

Ben

[edit: escaped code]


#2

I think you want to integrate out theta in this situation, i.e. use a multinomial-Dirichlet likelihood for the data.


#3

Thanks Ben, I hadn’t thought of that. It looks like Bob’s done the hard work for me of coding up the log probability: https://groups.google.com/forum/m/#!topic/stan-users/uhrIEddZ2I8


#4

In case it’s helpful, I’ve just had to implement Bob’s solution, so this is what your functions block would probably look like if you implemented a probability mass function for the Dirichlet-Multinomial:

functions {
  real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
    real alpha_plus = sum(alpha);
  
    return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y))) 
                - lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));
  }
}

#5

Thanks for posting. This is about as efficiently as it can be implemented without custom derivatives in C++.

You don’t need parens around return values. And it’s conventional to start new lines with operators rather than end them that way (easier to scan) in languages that allow it (R doesn’t as it isn’t whitespace neutral like Stan).


#6

Thanks Bob, I’ve implemented your style suggestions.


#7

I was thinking about the RNG function of this

dirichlet_multinomial_lpmf

I found this issue on github.

does anyone have an implementation? I don’t know much about RNGs, but can this be in principle produced by (something like)

functions {
  int[] dirichlet_multinomial_rng(vector alpha, int exposure) {

   	int K = rows(alpha);
    vector[K] alpha_dir = dirichlet_rng(alpha);
    int alpha_dir_mult[K] = multinomial_rng(alpha_dir, exposure);

    return alpha_dir_mult;
  }
}

Thanks!


#8

Yes, that’ll work for the RNG. I’d just write that as:

int[] dirichlet_multinomial_rng(vector alpha, int N) {
  return multinomial_rng(dirichlet_rng(alpha), N);
}

You could also test that each alpha[k] > 0 and is finite, and that N >= 0.


#9

In using

y[n] ~ dirichlet_multinomial( precision * softmax( X[n] * beta ) )

with a large array (~ 1K-20K), I get the vector softmax(X[n] * beta ) that includes incredibly low values. If I want to achieve unimodality in the dirichlet part I need all parameters to be > 1. I am attempting to set

parameters{
   ...
   real<lower=1> precision_z;
}
transformed parameters{
   ...
   precision = precision_z / min( softmax(X[n] * beta ) )
}

so the lower transformed parameter within the dirichlet-multinomial is > 1.

The model however is not stable (as I was imagining). Is there any cleaner solution?


#10

Check out this thread - it’s on sum to zero constraints, but may be relevant for the sum to 1 constraint.


#11

What should the posterior look like? Should it be a lot of effectively zero values? Or should it be smoothed to look more uniform?

That scaling is introducing a non-identifiability.

It’d be better to declare:

parameters {
  real<lower = 0> precision_delta;
...
model {
  matrix[? , ?] X_beta = X * beta;
  real precision = precision_delta + min(X_beta);
...

#12

I imagine it quite sparse, but with rare categories that are small but not tend to absolute 0 (like in the case of a Dirichlet parameter < 1)

Do you mean

min(exp(X_beta))

because X_beta can have negative numbers and precision delta could become negative?


However I think the problem could be another one, I observed that a unique precision parameter does not fit all categories, and the dirichlet_multinomial is not flexible enough for this biological data (one precision for all categories may not be sufficient. Probably this is the reason why people model sequencing data with NB rather than multinomial, because the gene-wise overdispersion structure is quite complex). See, a dedicated post:


#13

If that’s what it should be, it’s better coded as exp(min(X_beta)) as it only applies exp once.

It’s basically to deal with unmodeled overdispersion. The idea is that you might be able to make it multinomial with the right information, but without that, you’re left expanding the posterior intervals to get calibrated coverage.

The other problem with the Dirichlet is lack of covariance structure. But it’s hard to imagine estimating covariance over 10s of thousands of genes’ expression. It would require some kind of low-rank approximation.


#14

With this are you saying that is better off starting from multinomial and build up a more complex model, than using NB framework for coping with the complex overdispersion. Am I understanding well?

(I am trying to implement a gamma-multinomial model More flexible of Dirichlet-multinomial: gamma-multinomial (?))

For the amount of patients analised in the majority of studies at this stage I am more worried about solving the overdispersion modelling, rather than covariance structure.


#15

No. I’m saying that overdispersion is usually there because of missing predictors or other missing information.

The negative binomial can be hard to fit, but that’s a different issue.

Sounds reasonable. Though taken together they can be related. For example, two variables with sd of 1 and correlation of 0.9, treating them as independent makes values of (2, 2) seem much more extreme than they’d be when considering the correlation.