Dirichlet-Multinomial for nested observations

I would like to model age-class proportions (e.g., under 5 years old, 5 to 10 years old) for groups containing nested observations. My observations are municipal units where I have the count of people within each age class and my groups are regions containing different municipal units.

In the past, I have modeled these proportions by pre-aggregating the age-class observations within the respective groups, as follows —

data { 
    int n_classes; // number of age classes 
    int n_reg: //number of regions
    int age_count_reg[n_reg, n_classes]; // multinomial observations (regional level)
  parameters { 
    vector<lower=0> [n_classes]alpha_reg; 
  model { 
    alpha_reg ~ lognormal(0, 5); // prior on alpha reg
    for (n_i in 1:n_reg) { 
      theta_reg[n_i] ~ dirichlet(alpha_reg); 
      age_count_reg[n_i] ~ multinomial(theta_reg[n_i]);

This was a quick but sub-optimal solution because theta_reg does not account for the variance in the observations within a group. This is very much an open question but, I was wondering if there is a way to model this in a nested fashion. Any suggestion is more than welcome (@martinmodrak)!

1 Like

Usually, in Dirichlet regression you would parametrize the model in terms of a mean simplex and precision (see e.g. https://www.psychoco.org/2011/slides/Maier_hdt.pdf for some introduction where this is “Parametrization 2”). You then should be able to model predictors for the mean simplex by either:

  • taking one category as reference, setting its predictor to 0 and having your usual linear predictors (with varying intercept per groups) for all other groups and then computing the mean vector via softmax. This is the usual approach in categorical/dirichlet regression. The coefficients are then a bit hard to interpret and you may want to prefer using model predictions for more interpreatable results (see e.g. Dirichlet Regression using either the Common or Alternative Parameterization - #10 by martinmodrak for some examples)
  • Having an independent predictor for all categories but put a sum-to-zero constraint for all predictors across categories. I have much less experience with this, but it can lead to more interpretable coefficients and avoids the need for a reference category, but the constraints make the model a more tricky and potentially harder to fit (see e.g. Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for discussion of ways to encode the constraint).

Additionally, in your model, theta_reg can be integrated out and you can use the dirichlet multinomial distribution directly for increased performance, see. Dirichlet-multinomial distribution - Wikipedia , thread at Priors for highly skewed multinomial word counts - #3 by stemangiola has a Stan implementation to be put in the functions block.

EDIT: Sorry @gianlucaboo I’ve sent the response prematurely, it is know a bit more complete


Hi @martinmodrak — thanks for the additional information!