where alpha is a univariate parameter. Unsurprisingly, this model – being non-centred – suffers from divergent iterations. These do not go away if I increase adapt_delta nor if I decrease the step size. I think I need a non-centred parameterisation here.
I’ve had a look at can’t seem to find one that people have used in the past for Dirichlet distributions. Does anyone have an idea how to ‘non-centre’ the above?
In case it’s helpful, I’ve just had to implement Bob’s solution, so this is what your functions block would probably look like if you implemented a probability mass function for the Dirichlet-Multinomial:
functions {
real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
real alpha_plus = sum(alpha);
return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y)))
- lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));
}
}
Thanks for posting. This is about as efficiently as it can be implemented without custom derivatives in C++.
You don’t need parens around return values. And it’s conventional to start new lines with operators rather than end them that way (easier to scan) in languages that allow it (R doesn’t as it isn’t whitespace neutral like Stan).
with a large array (~ 1K-20K), I get the vector softmax(X[n] * beta ) that includes incredibly low values. If I want to achieve unimodality in the dirichlet part I need all parameters to be > 1. I am attempting to set
I imagine it quite sparse, but with rare categories that are small but not tend to absolute 0 (like in the case of a Dirichlet parameter < 1)
Do you mean
min(exp(X_beta))
because X_beta can have negative numbers and precision delta could become negative?
However I think the problem could be another one, I observed that a unique precision parameter does not fit all categories, and the dirichlet_multinomial is not flexible enough for this biological data (one precision for all categories may not be sufficient. Probably this is the reason why people model sequencing data with NB rather than multinomial, because the gene-wise overdispersion structure is quite complex). See, a dedicated post:
If that’s what it should be, it’s better coded as exp(min(X_beta)) as it only applies exp once.
It’s basically to deal with unmodeled overdispersion. The idea is that you might be able to make it multinomial with the right information, but without that, you’re left expanding the posterior intervals to get calibrated coverage.
The other problem with the Dirichlet is lack of covariance structure. But it’s hard to imagine estimating covariance over 10s of thousands of genes’ expression. It would require some kind of low-rank approximation.
With this are you saying that is better off starting from multinomial and build up a more complex model, than using NB framework for coping with the complex overdispersion. Am I understanding well?
For the amount of patients analised in the majority of studies at this stage I am more worried about solving the overdispersion modelling, rather than covariance structure.
No. I’m saying that overdispersion is usually there because of missing predictors or other missing information.
The negative binomial can be hard to fit, but that’s a different issue.
Sounds reasonable. Though taken together they can be related. For example, two variables with sd of 1 and correlation of 0.9, treating them as independent makes values of (2, 2) seem much more extreme than they’d be when considering the correlation.