I’m trying to construct a multinomial model with two effects, district and age. I’ve followed @rtrangucci’s model at the bottom of this notebook but had a few questions.
My data looks like the following. I’ve also annotated it in the data
block. Basically, I have a survey with responds who favor one of four vodkas (and a fifth zero column for identifiability). They are located in one of four districts and are in one of four age groups. I want to understand the probability an age group prefers a vodka.
Model code is below:
data {
int<lower=2> K; // number of Vodkas
int<lower=0> N; // total survey responses
int<lower=0> y[N, K]; // data matrix of vodka responses
int<lower=1> J_age; // J ages (1-4)
int<lower=1> J_district; // J districts (1-4)
int<lower=1> G; // G levels (2)
int<lower=1, upper=J_age> idx_age[N]; // index column
int<lower=1, upper=J_district> idx_district[N]; // index column
}
transformed data {
row_vector[J_age] zeros_age = rep_row_vector(0, J_age);
row_vector[J_district] zeros_district = rep_row_vector(0, J_district);
}
parameters {
vector[K - 1] alpha_raw;
matrix[K - 1, J_age] eta_age;
matrix[K - 1, J_district] eta_district;
vector<lower=0>[K - 1] sigma_age;
vector<lower=0>[K - 1] sigma_district;
vector<lower=0>[G] sigma_inter_eqn;
}
transformed parameters {
matrix[K, J_age] alpha_age;
matrix[K, J_district] alpha_district;
vector[K] alpha;
for (k in 1:(K - 1)) {
alpha[k] = alpha_raw[k];
alpha_age[k] = sigma_inter_eqn[1] * sigma_age[k] * eta_age[k];
alpha_district[k] = sigma_inter_eqn[2] * sigma_district[k] * eta_district[k];
}
alpha[K] = 0;
alpha_age[K] = zeros_age;
alpha_district[K] = zeros_district;
}
model {
to_vector(eta_age) ~ normal(0, 1);
to_vector(eta_district) ~ normal(0, 1);
sigma_inter_eqn ~ normal(0, 1);
sigma_age ~ normal(0, 1);
sigma_district ~ normal(0, 1);
alpha_raw ~ normal(0, 1);
for (n in 1:N) y[n] ~ multinomial(softmax(col(alpha_age,idx_age[n]) + col(alpha_district,idx_district[n])));
}
generated quantities {
vector[K] theta[J_age];
for (j in 1:J_age) theta[j] = softmax(col(alpha_age,j));
}
I have two questions.
- Understanding the effects pooling.
Why, when varying the intercept, do we multiply the etas
by the district and age-level hyperparameters? I suppose this is a somewhat basic question, but why don’t we add our etas in the way we would a hierarchical linear model, when defining our treatment effect?
alpha_age[k] = sigma_inter_eqn[1] * sigma_age[k] * eta_age[k];
alpha_district[k] = sigma_inter_eqn[2] * sigma_district[k] * eta_district[k];
- Understanding how to interpret and construct
generated quantities
for the effects.
I would like to return the mean estimate for the probabilities an age group prefers a particular vodka brand. I’ve struggled with how to build a generated quantities
block to understand which age group prefers which vodka. The below block seems to work in the sense that a larger value corresponds with a greater probability of choosing a vodka, but I’m not sure how to interpret the values.
Because we’ve modeled on the softmax, I’m not sure how to back transform (if I can at all) these values. If I wrap this line in another softmax
, like so: for (j in 1:J_age) theta[j] = softmax(col(alpha_age,j));
to put the posteriors on a 0 to 1 scale, but, again, not sure I gain any interpretability that way.
for (n in 1:N) y[n] ~ multinomial(softmax(col(alpha_age,idx_age[n]) + col(alpha_district,idx_district[n])));
}
generated quantities {
vector[K] theta[J_age];
for (j in 1:J_age) theta[j] = col(alpha_age,j);
}