How to fit a Multivariate Dirichlet Distribution?

I need to fit a probability distribution to a Markovian transition matrix, rows of which are probability vectors. However, I don’t want to assume independence among the probability vectors of the matrix and so I need a joint pdf on these vectors to account for their possible dependence. The only distribution that comes to my mind is the Multivariate Dirichlet Distribution (MDD) which I couldn’t find many resources on it. Does anyone know if MDD is the proper joint pdf to use with the markovian transition matrix and if yes how it can be fit in Stan? Is there any examples of fitting such a pdf?

I don’t know if there’s a more direct way to do this, but one approach would be to fit the matrix elements as a multivariate gamma and then normalize the rows. Note that this requires additional constraints for identifiability.

The goal is to fit a pdf to those stochastic matrices so that given a new stochastic matrix you should be able to calculate its density. Your suggested approach sounds like a hack and I am not sure if it properly constructs the pdf. Please correct me if I am wrong.

My original suggestion was overly roundabout. What I am suggesting is that you insert some hierarchical structure across rows that sits atop the parameters of the Dirichlet distribution. This will yield a multivariate Dirichlet in the sense that each row will be Dirichlet but the rows will not be independent. If this isn’t what you need, then sorry for the noise!

Actually I liked the idea of using a hierarchical structure. Thank you for suggesting it. I was too busy with the idea of using a multivariate Dirichlet that I didn’t really think about modelling the dependency through a hierarchical structure.

p.s: Surprisingly I couldn’t find much on multivariate Dirichlet distribution except for one or two old papers so I guess implementing that would be too fiddly.

The proposed use of gamma distributions by @jsocolar is far from a hack. If X, Y, and Z are independently gamma-distributed with same rate but different shapes \alpha_{1:3}, then the normalised version of X, Y, and Z is actually Dirichlet-distributed with scale vector \alpha_{1:3}.

I’m not sure there is an identifiability issue here. If you specify a hierarchical model, the hyperparameters should shrink all the \alpha such that they sufficiently capture the variability across the different types of outcomes, correct?

1 Like