Choosing prior for "overdispersion" in Dirichlet Multinomial distribution

Sorry for commenting on an old thread, but I was looking for recommendations for modeling the over-dispersion parameter in an integrated Dirichlet-Multinomial model and it seemed there’s an excellent thread already (the one by @stemangiola on the very first post on this thread).

Is there a recommendation for prior choice for the over-dispersion parameter in the integrated Dirichlet–multinomial model, or, should we treat it as a tuning parameter or empirical Bayes? Anything that you’ve found particularly useful? Any practical suggestions/insights will be really useful.

Context: I was trying to mimic the numerical study in this paper (but using a shrinkage prior rather than spike-and-slab). It seems they have a precision parameter but couldn’t find out how to model it.


Full paper here (if anyone is interested):

I don’t have any good suggestions for this case in particular, but generally there are two (complentary) ways to choose priors that I like:

  • prior predictive checks (as in
  • penalized complexity priors, i.e. make the prior favor a simpler model. In your case it might make sense to favor \psi = 0 with something like \psi \sim \mathrm{Beta}(1, q) for a suitable q., Some more background and maths in but I won’t pretend I completely understand it, neither that my suggestion of prior here follows the maths of the paper.
1 Like

Thanks a lot @martinmodrak !

I love PCP and have used them before (for a different problem), and it makes a lot of sense to think in that direction. I was actually using a Beta(\frac{1}{2}, \frac{1}{2}) prior on \psi, in the same spirit as the regular horseshoe - i.e. favour \psi \approx 0 and \psi \approx 1 (but it leads to more divergence/ higher \hat{R} etc.). Maybe a Beta(1, \beta) or Beta(\alpha,1) with \alpha < 1 will work better.

I’ll also check the prior sensitivity: that ought to give us some insights.

If you are having divergences, one thing I would check is that as \psi \rightarrow 0 you might be actually passing large/infinite params to the DM distribution which might cause numerical issues that manifest as divergences.

I am not sure how to handle this well, but I guess some form of mathematical rearrangement to give a numerically stable implementation of DM in this parametrization should be possible (I don’t see it immediately though…).

I am also moving this discussion to a new thread for clarity.