I’m trying to model CCES data (vote outcomes given various demographic variables) using MR (not yet up to the “P” part!) in Stan. Everything works fine–convergence looks good, sampler is fast, parameters make sense, etc.–until I actually introduce the MR bit.
One thing I am confused about in particular is the use of sum-to-zero constraints (enforced via a tight prior around 0 on the sum) on the parameter corresponding to a group. I can see why I’d want this, since otherwise I can shift all group coefficients one way and my intercept the other. But I’m concerned that if the number of counts in each member of a group are not similar then such a constraint is allowing the categories with fewer counts to have more influence than they should? And also, if the sizes are not approximately equal, wouldn’t the constraint fail to solve the non-identifiability issue?
One thing I observe in the model below, is that the sampler is much faster and the chains are mixing better, etc. when the sum-to-zero prior is removed. But in that case I also get divergences. With the constraint, it is slow and the mixing is poor, autocorrelations are large out to many lags, etc. but the divergences go away. In both cases, adding the group has a significant effect on the intercept, which makes me think I still have non-identifiability issues.
Addendum: I tried using a “weighted” sum-to-zero prior by constructing a vector of counts for each category in the grouping, dividing that vector by N and then having the prior be:
dot_product(beta_g, weights_g) ~ normal(0, 1e-3);
This is faster, has better Rhats, and autocorrelations, but has more divergences.
The code is below. Any thoughts about how to diagnose/think about these issues are most welcome!
data {
int<lower=2> N_State;
int<lower=1> N;
int<lower=1> State[N];
int<lower=2> N_Education;
int<lower=1> Education[N];
int<lower=2> N_Race;
int<lower=1> Race[N];
int<lower=2> N_CD;
int<lower=1> CD[N];
int<lower=2> N_Sex;
int<lower=1> Sex[N];
int K_CD;
matrix[N_CD,K_CD] X_CD;
int<lower=1> T[N];
int<lower=0> S[N];
}
transformed data {
vector[K_CD] mean_X_CD;
matrix[N_CD,K_CD] centered_X_CD;
for (k in 1:K_CD) {
mean_X_CD[k] = mean(X_CD[,k]);
centered_X_CD[,k] = X_CD[,k] - mean_X_CD[k];
}
matrix[N_CD,K_CD] Q_CD_ast;
matrix[K_CD,K_CD] R_CD_ast;
matrix[K_CD,K_CD] R_CD_ast_inverse;
Q_CD_ast = qr_thin_Q(centered_X_CD) * sqrt(N_CD - 1);
R_CD_ast = qr_thin_R(centered_X_CD) / sqrt(N_CD - 1);
R_CD_ast_inverse = inverse(R_CD_ast);
}
parameters {
vector[K_CD] thetaX_CD;
real alpha;
real eps_Education;
real<lower=0> sigma_Race;
vector<multiplier=sigma_Race>[N_Race] beta_Race;
real eps_Sex;
}
transformed parameters {
vector[K_CD] betaX_CD;
betaX_CD = R_CD_ast_inverse * thetaX_CD;
}
model {
alpha ~ normal(0,1.0);
thetaX_CD ~ normal(0,1.0);
eps_Education ~ normal(0, 1.0);
beta_Race ~ normal(0, sigma_Race);
sigma_Race ~ normal(0, 2.0);
sum(beta_Race) ~ normal(0, 1.0e-4);
eps_Sex ~ normal(0, 1.0);
S ~ binomial_logit(T, alpha + Q_CD_ast[CD] * thetaX_CD + to_vector({eps_Education, -eps_Education}[Education]) + beta_Race[Race] + to_vector({eps_Sex, -eps_Sex}[Sex]));
}
generated quantities {
vector<lower=0, upper=1>[N] yPred = inv_logit(alpha + centered_X_CD[CD] * betaX_CD + to_vector({eps_Education, -eps_Education}[Education]) + beta_Race[Race] + to_vector({eps_Sex, -eps_Sex}[Sex]));
real<lower=0> SPred[N];
for (n in 1:N) {
SPred[n] = yPred[n] * T[n];
}
vector [N] log_lik;
for (n in 1:N) {
log_lik[n] = binomial_logit_lpmf(S[n] | T[n], alpha + centered_X_CD[CD][n] * betaX_CD + {eps_Education, -eps_Education}[Education[n]] + beta_Race[Race[n]] + {eps_Sex, -eps_Sex}[Sex[n]]);
}
}