Possible to model zero-inflated dirichlet distribution?


I have data on historic election results in Britain that I would like to model using brms.

The data look something like this:

party1 party2 party3 party4 party5   region
   0.4    0.4    0.2      0      0  England
   0.5    0.5      0      0      0  England
   0.2    0.5    0.1      0    0.2 Scotland
   0.3    0.2    0.4    0.1      0    Wales

These are well suited to modelling using dirichlet regression: results can take any value between 0-1 and all outcomes must sum to 1.

The problem, however, is that some parties do not stand in particular cases. Typically this is for two reasons. Either the party is a regionalist one and stands only in particular areas or the party is a minor one without the resources to contest all elections. In both cases, the party gets a zero.

This is a problem as dirichlet regression requires that all values be greater than 0. Rather than fudge this by replacing 0 with a tiny number, I’d like to model the zero-inflation. Effectively, this would be a multinomial extension of the zero-inflated beta distribution in the same way that the dirichlet distribution is a multinomial extension of the standard beta distribution. I’d like to do this both because I’m intending to use the model to make predictions and because the zero-inflation is theoretically interesting in its own right.

Is this possible in brms? For example, is it possible for me to define this using the mixture() function?

Edit: This appears to be possible using the zadr() function in the Compositional package and an accompanying paper can be found here. There is also a paper on zero-inflated dirichlet regression when modelling microbiome data.

As you say, we need some sort of zero-inflated dirichlet distribution, which brms currently does not support. You may open an issue on github for this purpose (and point to the relevant resources where such a model has been suggested and described) and I may eventually implement it.

1 Like

Thanks, Paul. I’ll do just that.