Dirichlet-Multinomial for nested observations

Usually, in Dirichlet regression you would parametrize the model in terms of a mean simplex and precision (see e.g. https://www.psychoco.org/2011/slides/Maier_hdt.pdf for some introduction where this is “Parametrization 2”). You then should be able to model predictors for the mean simplex by either:

  • taking one category as reference, setting its predictor to 0 and having your usual linear predictors (with varying intercept per groups) for all other groups and then computing the mean vector via softmax. This is the usual approach in categorical/dirichlet regression. The coefficients are then a bit hard to interpret and you may want to prefer using model predictions for more interpreatable results (see e.g. Dirichlet Regression using either the Common or Alternative Parameterization - #10 by martinmodrak for some examples)
  • Having an independent predictor for all categories but put a sum-to-zero constraint for all predictors across categories. I have much less experience with this, but it can lead to more interpretable coefficients and avoids the need for a reference category, but the constraints make the model a more tricky and potentially harder to fit (see e.g. Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for discussion of ways to encode the constraint).

Additionally, in your model, theta_reg can be integrated out and you can use the dirichlet multinomial distribution directly for increased performance, see. Dirichlet-multinomial distribution - Wikipedia , thread at Priors for highly skewed multinomial word counts - #3 by stemangiola has a Stan implementation to be put in the functions block.

EDIT: Sorry @gianlucaboo I’ve sent the response prematurely, it is know a bit more complete

2 Likes