My research mainly focuses on a variant of the famous Latent Dirichlet Allocation model, namely zinLDA (zero-inflated Latent Dirichlet Allocation model). In a nutshell, it follows the same generation steps as the LDA except for the fact that the word probabilities under each topic (beta) don’t follow the Dirichlet distribution anymore, rather it follows a zero-inflated Generalized Dirichlet distribution (ZIGD) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410344/).

Currently, we have a running version in R which is pretty slow, and thus we turn to Stan for faster computation purposes. However, due to my limited programming expertise, I am a bit skeptical about whether this generative model can actually be implemented using RStan, typically for the ZIGD step. I am attaching the file with the hierarchical setup (on page 1). It involves sampling a parameter Delta at first from a Bernoulli distribution and then based on whether the Delta entries are 1/0, the betas are either 0 or drawn from a Generalized Dirichlet distribution. I have looked at a current implementation of the vanilla LDA model in RStan, but I am not really sure about the changes needed to make within the code to make it suitable for zinLDA.

Can anyone provide some suggestions on how to tweak the existing LDA stan code for zinLDA?

Thanks!

Variational_Inference_for_zinLDA.pdf (201.6 KB)