Non-centred parameterisation for Dirichlet distribution

Ben_Lambert · September 10, 2017, 4:38pm

I am trying to fit a model of the form:

for(i in 1:N)
   X[i] ~ multinomial(theta);
theta ~ dirichlet(exp(alpha));
alpha ~ normal(0,1);

where alpha is a univariate parameter. Unsurprisingly, this model – being non-centred – suffers from divergent iterations. These do not go away if I increase adapt_delta nor if I decrease the step size. I think I need a non-centred parameterisation here.

I’ve had a look at can’t seem to find one that people have used in the past for Dirichlet distributions. Does anyone have an idea how to ‘non-centre’ the above?

Best,

Ben

[edit: escaped code]

bgoodri · September 10, 2017, 4:59pm

I think you want to integrate out theta in this situation, i.e. use a multinomial-Dirichlet likelihood for the data.

Ben_Lambert · September 10, 2017, 5:10pm

Thanks Ben, I hadn’t thought of that. It looks like Bob’s done the hard work for me of coding up the log probability: https://groups.google.com/forum/m/#!topic/stan-users/uhrIEddZ2I8

FrankD · January 12, 2018, 3:13pm

In case it’s helpful, I’ve just had to implement Bob’s solution, so this is what your functions block would probably look like if you implemented a probability mass function for the Dirichlet-Multinomial:

functions {
  real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
    real alpha_plus = sum(alpha);
  
    return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y))) 
                - lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));
  }
}

Bob_Carpenter · January 16, 2018, 9:46pm

Thanks for posting. This is about as efficiently as it can be implemented without custom derivatives in C++.

You don’t need parens around return values. And it’s conventional to start new lines with operators rather than end them that way (easier to scan) in languages that allow it (R doesn’t as it isn’t whitespace neutral like Stan).

FrankD · January 19, 2018, 5:03pm

Thanks Bob, I’ve implemented your style suggestions.

stemangiola · June 12, 2018, 6:48am

I was thinking about the RNG function of this

dirichlet_multinomial_lpmf

I found this issue on github.

does anyone have an implementation? I don’t know much about RNGs, but can this be in principle produced by (something like)

functions {
  int[] dirichlet_multinomial_rng(vector alpha, int exposure) {

   	int K = rows(alpha);
    vector[K] alpha_dir = dirichlet_rng(alpha);
    int alpha_dir_mult[K] = multinomial_rng(alpha_dir, exposure);

    return alpha_dir_mult;
  }
}

Thanks!

Bob_Carpenter · June 12, 2018, 9:53pm

Yes, that’ll work for the RNG. I’d just write that as:

int[] dirichlet_multinomial_rng(vector alpha, int N) {
  return multinomial_rng(dirichlet_rng(alpha), N);
}

You could also test that each alpha[k] > 0 and is finite, and that N >= 0.

stemangiola · June 14, 2018, 1:57am

In using

y[n] ~ dirichlet_multinomial( precision * softmax( X[n] * beta ) )

with a large array (~ 1K-20K), I get the vector softmax(X[n] * beta ) that includes incredibly low values. If I want to achieve unimodality in the dirichlet part I need all parameters to be > 1. I am attempting to set

parameters{
   ...
   real<lower=1> precision_z;
}
transformed parameters{
   ...
   precision = precision_z / min( softmax(X[n] * beta ) )
}

so the lower transformed parameter within the dirichlet-multinomial is > 1.

The model however is not stable (as I was imagining). Is there any cleaner solution?

aaronjg · June 14, 2018, 2:41am

Check out this thread - it’s on sum to zero constraints, but may be relevant for the sum to 1 constraint.

Bob_Carpenter · June 14, 2018, 4:25am

What should the posterior look like? Should it be a lot of effectively zero values? Or should it be smoothed to look more uniform?

That scaling is introducing a non-identifiability.

It’d be better to declare:

parameters {
  real<lower = 0> precision_delta;
...
model {
  matrix[? , ?] X_beta = X * beta;
  real precision = precision_delta + min(X_beta);
...

stemangiola · June 14, 2018, 6:24am

I imagine it quite sparse, but with rare categories that are small but not tend to absolute 0 (like in the case of a Dirichlet parameter < 1)

Do you mean

min(exp(X_beta))

because X_beta can have negative numbers and precision delta could become negative?

However I think the problem could be another one, I observed that a unique precision parameter does not fit all categories, and the dirichlet_multinomial is not flexible enough for this biological data (one precision for all categories may not be sufficient. Probably this is the reason why people model sequencing data with NB rather than multinomial, because the gene-wise overdispersion structure is quite complex). See, a dedicated post:

Bob_Carpenter · June 15, 2018, 5:11pm

If that’s what it should be, it’s better coded as exp(min(X_beta)) as it only applies exp once.

It’s basically to deal with unmodeled overdispersion. The idea is that you might be able to make it multinomial with the right information, but without that, you’re left expanding the posterior intervals to get calibrated coverage.

The other problem with the Dirichlet is lack of covariance structure. But it’s hard to imagine estimating covariance over 10s of thousands of genes’ expression. It would require some kind of low-rank approximation.

stemangiola · June 16, 2018, 4:24am

With this are you saying that is better off starting from multinomial and build up a more complex model, than using NB framework for coping with the complex overdispersion. Am I understanding well?

(I am trying to implement a gamma-multinomial model More flexible than Dirichlet-multinomial: gamma-multinomial (?) - #2 by stemangiola)

For the amount of patients analised in the majority of studies at this stage I am more worried about solving the overdispersion modelling, rather than covariance structure.

Bob_Carpenter · June 17, 2018, 5:59am

No. I’m saying that overdispersion is usually there because of missing predictors or other missing information.

The negative binomial can be hard to fit, but that’s a different issue.

Sounds reasonable. Though taken together they can be related. For example, two variables with sd of 1 and correlation of 0.9, treating them as independent makes values of (2, 2) seem much more extreme than they’d be when considering the correlation.

Topic		Replies	Views
Non-centered parametrisation for hierarchical log_mix (mixed) distribution Modeling rstan , techniques	4	827	February 21, 2022
Trying to generalise the dirichlet-multinomial (non-analytical) framework (replacing Dirichlet with other distributions) Modeling techniques , bioinformatics , dirichlet-multinomial	23	2386	November 5, 2018
Poisson Hierarchical Model Centered/Non-centered Parameterization Modeling	7	1082	January 6, 2022
Non-Centered parameterizations Modeling	4	379	April 8, 2021
Poisson Hierarchical Model Non-centered Parametrization Modeling	3	657	January 12, 2022

Non-centred parameterisation for Dirichlet distribution

Related Topics