# Understanding priors and generated quantities in hierarchical multinomial

I’m trying to construct a multinomial model with two effects, district and age. I’ve followed @rtrangucci’s model at the bottom of this notebook but had a few questions.

My data looks like the following. I’ve also annotated it in the `data` block. Basically, I have a survey with responds who favor one of four vodkas (and a fifth zero column for identifiability). They are located in one of four districts and are in one of four age groups. I want to understand the probability an age group prefers a vodka.

Model code is below:

``````data {
int<lower=2> K; // number of Vodkas
int<lower=0> N; // total survey responses
int<lower=0> y[N, K]; // data matrix of vodka responses
int<lower=1> J_age; // J ages (1-4)
int<lower=1> J_district; // J districts (1-4)
int<lower=1> G; // G levels (2)
int<lower=1, upper=J_age> idx_age[N]; // index column
int<lower=1, upper=J_district> idx_district[N]; // index column
}
transformed data {
row_vector[J_age] zeros_age = rep_row_vector(0, J_age);
row_vector[J_district] zeros_district = rep_row_vector(0, J_district);
}
parameters {
vector[K - 1] alpha_raw;
matrix[K - 1, J_age] eta_age;
matrix[K - 1, J_district] eta_district;
vector<lower=0>[K - 1] sigma_age;
vector<lower=0>[K - 1] sigma_district;
vector<lower=0>[G] sigma_inter_eqn;
}
transformed parameters {
matrix[K, J_age] alpha_age;
matrix[K, J_district] alpha_district;
vector[K] alpha;

for (k in 1:(K - 1)) {
alpha[k] = alpha_raw[k];
alpha_age[k] = sigma_inter_eqn[1] * sigma_age[k] * eta_age[k];
alpha_district[k] = sigma_inter_eqn[2] * sigma_district[k] * eta_district[k];
}
alpha[K] = 0;
alpha_age[K] = zeros_age;
alpha_district[K] = zeros_district;
}
model {
to_vector(eta_age) ~ normal(0, 1);
to_vector(eta_district) ~ normal(0, 1);
sigma_inter_eqn ~ normal(0, 1);
sigma_age ~ normal(0, 1);
sigma_district ~ normal(0, 1);
alpha_raw ~ normal(0, 1);

for (n in 1:N) y[n] ~ multinomial(softmax(col(alpha_age,idx_age[n]) + col(alpha_district,idx_district[n])));
}
generated quantities {
vector[K] theta[J_age];
for (j in 1:J_age) theta[j] = softmax(col(alpha_age,j));
}
``````

I have two questions.

1. Understanding the effects pooling.

Why, when varying the intercept, do we multiply the `etas` by the district and age-level hyperparameters? I suppose this is a somewhat basic question, but why don’t we add our etas in the way we would a hierarchical linear model, when defining our treatment effect?

``````    alpha_age[k] = sigma_inter_eqn[1] * sigma_age[k] * eta_age[k];
alpha_district[k] = sigma_inter_eqn[2] * sigma_district[k] * eta_district[k];
``````
1. Understanding how to interpret and construct `generated quantities` for the effects.

I would like to return the mean estimate for the probabilities an age group prefers a particular vodka brand. I’ve struggled with how to build a `generated quantities` block to understand which age group prefers which vodka. The below block seems to work in the sense that a larger value corresponds with a greater probability of choosing a vodka, but I’m not sure how to interpret the values.

Because we’ve modeled on the softmax, I’m not sure how to back transform (if I can at all) these values. If I wrap this line in another `softmax`, like so: `for (j in 1:J_age) theta[j] = softmax(col(alpha_age,j));` to put the posteriors on a 0 to 1 scale, but, again, not sure I gain any interpretability that way.

``````    for (n in 1:N) y[n] ~ multinomial(softmax(col(alpha_age,idx_age[n]) + col(alpha_district,idx_district[n])));
}
generated quantities {
vector[K] theta[J_age];
for (j in 1:J_age) theta[j] = col(alpha_age,j);
}
``````

This is probably because it is in “non-centered” parametrization (mentioned for example in the case study: Diagnosing Biased Inference with Divergences)

I am not sure this is a sensible approach, due to the multinomial nature of the data, there are negative correlations between the preferences. Depending on whether you are more interested in the true/latent/unobserved probabilities (without sampling noise) or in future observations (with sampling noise), I would just either store the vector of multinomial probabilities (here, `softmax(col(alpha_age,idx_age[n]) + col(alpha_district,idx_district[n]))`) or add `multinomial_rng` on top of it.

Means, quantiles etc. of those values should be directly interpretable.

Hope that helps, albeit a bit late.

1 Like