I would like to model age-class proportions (e.g., under 5 years old, 5 to 10 years old) for groups containing nested observations. My observations are municipal units where I have the count of people within each age class and my groups are regions containing different municipal units.

In the past, I have modeled these proportions by pre-aggregating the age-class observations within the respective groups, as follows —

```
data {
int n_classes; // number of age classes
int n_reg: //number of regions
int age_count_reg[n_reg, n_classes]; // multinomial observations (regional level)
}
parameters {
vector<lower=0> [n_classes]alpha_reg;
simplex[n_classes]theta_obs[n_reg];
}
model {
alpha_reg ~ lognormal(0, 5); // prior on alpha reg
for (n_i in 1:n_reg) {
theta_reg[n_i] ~ dirichlet(alpha_reg);
age_count_reg[n_i] ~ multinomial(theta_reg[n_i]);
}
```

This was a quick but sub-optimal solution because `theta_reg`

does not account for the variance in the observations within a group. This is very much an open question but, I was wondering if there is a way to model this in a nested fashion. Any suggestion is more than welcome (@martinmodrak)!

1 Like

Usually, in Dirichlet regression you would parametrize the model in terms of a mean simplex and precision (see e.g. https://www.psychoco.org/2011/slides/Maier_hdt.pdf for some introduction where this is “Parametrization 2”). You then should be able to model predictors for the mean simplex by either:

- taking one category as reference, setting its predictor to 0 and having your usual linear predictors (with varying intercept per groups) for all other groups and then computing the mean vector via
`softmax`

. This is the usual approach in categorical/dirichlet regression. The coefficients are then a bit hard to interpret and you may want to prefer using model predictions for more interpreatable results (see e.g. Dirichlet Regression using either the Common or Alternative Parameterization - #10 by martinmodrak for some examples)
- Having an independent predictor for all categories but put a sum-to-zero constraint for all predictors across categories. I have much less experience with this, but it can lead to more interpretable coefficients and avoids the need for a reference category, but the constraints make the model a more tricky and potentially harder to fit (see e.g. Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for discussion of ways to encode the constraint).

Additionally, in your model, `theta_reg`

can be integrated out and you can use the dirichlet multinomial distribution directly for increased performance, see. Dirichlet-multinomial distribution - Wikipedia , thread at Priors for highly skewed multinomial word counts - #3 by stemangiola has a Stan implementation to be put in the `functions`

block.

**EDIT:** Sorry @gianlucaboo I’ve sent the response prematurely, it is know a bit more complete

2 Likes

Hi @martinmodrak — thanks for the additional information!