I’m trying to model voter turnout and voter preference. I’ve got two sets of data, each with the same predictors (geography, population density, age, sex, race), one (CPS) with turnout data which I use to model the probability of a particular type of person turning out to vote, and the other with voting intention data (CCES) which I use to model the probability of a particular type of person voting for a particular candidate.
The result I am interested in is a post-stratification of the product of those probabilities across the population data for a particular region.
I could model them separately, but then, I think, I lose the ability to produce confidence intervals. So I’m trying to model them together (see Stan code below). The combined model has 2 sets of parameters, one for turnout and one for preference (vote-intention) and two binomial sampling statements.
One simple question: is this more or less the right way to do this?
Since I only care about the product, is there some simpler sampling statement that I could use? I ask that because the model runs very slowly and has a lot of divergences (2-3%). My comparison for speed is just modeling either probability separately which runs about 100 times faster.
The divergences are something I could try to tackle by re-parameterization and increasing adapt-delta and such. But, because it’s so slow, that will be tricky. So I’d appreciate any pointers on doing this or to know if I’m better off modeling things separately and accepting less information about my uncertainties.
Thanks!
Code follows.
data {
int<lower=2> J_I_CDData;
int<lower=1> N_CDData;
int<lower=1> I_CDData[N_CDData];
int<lower=2> J_CD;
int<lower=1> N_VData;
int<lower=1> CD[N_VData];
int<lower=2> J_State;
int<lower=1> State[N_VData];
int<lower=2> J_Sex;
int<lower=1> Sex[N_VData];
int<lower=2> J_Race;
int<lower=1> Race[N_VData];
int<lower=2> J_Education;
int<lower=1> Education[N_VData];
int<lower=1> VData_CDData[N_VData];
int K_X_CDData;
matrix[N_CDData,K_X_CDData] X_CDData;
int<lower=0> CVAP[N_VData];
int<lower=0> VOTED[N_VData];
int<lower=0> VOTED_C[N_VData];
int<lower=0> DVOTES_C[N_VData];
}
transformed data {
vector[K_X_CDData] mean_X_CDData;
matrix[N_CDData,K_X_CDData] centered_X_CDData;
for (k in 1:K_X_CDData) {
mean_X_CDData[k] = mean(X_CDData[,k]);
centered_X_CDData[,k] = X_CDData[,k] - mean_X_CDData[k];
}
matrix[N_CDData,K_X_CDData] Q_CDData_ast = qr_thin_Q(centered_X_CDData) * sqrt(N_CDData - 1);
matrix[K_X_CDData,K_X_CDData] R_CDData_ast = qr_thin_R(centered_X_CDData) / sqrt(N_CDData - 1);
matrix[K_X_CDData,K_X_CDData] R_CDData_ast_inverse = inverse(R_CDData_ast);
}
parameters {
real alphaT;
vector[K_X_CDData] thetaX_CDData_T;
real epsT_Sex;
real epsT_Education;
real<lower=0> sigmaT_Race;
vector[J_Race] betaT_Race_raw;
real alpahP;
vector[K_X_CDData] thetaX_CDData_P;
real epsP_Sex;
real epsP_Education;
real<lower=0> sigmaP_Race;
vector[J_Race] betaP_Race_raw;
}
transformed parameters {
vector[K_X_CDData] betaX_CDData_T;
betaX_CDData_T = R_CDData_ast_inverse * thetaX_CDData_T;
vector[J_Race] betaT_Race = sigmaT_Race * betaT_Race_raw;
vector[K_X_CDData] betaX_CDData_P;
betaX_CDData_P = R_CDData_ast_inverse * thetaX_CDData_P;
vector[J_Race] betaP_Race = sigmaP_Race * betaP_Race_raw;
}
model {
alphaT ~ normal(0,2);
thetaX_CDData_T ~ normal(0,2);
epsT_Sex ~ normal(0,2);
epsT_Education ~ normal(0,2);
sigmaT_Race ~ normal(0,2);
betaT_Race_raw ~ normal(0,1);
VOTED ~ binomial_logit(CVAP,alphaT + Q_CDData_ast[VData_CDData] * thetaX_CDData_T + to_vector({epsT_Sex,-epsT_Sex}[Sex]) + betaT_Race[Race]);
alpahP ~ normal(0,2);
thetaX_CDData_P ~ normal(0,2);
epsP_Sex ~ normal(0,2);
epsP_Education ~ normal(0,2);
sigmaP_Race ~ normal(0,2);
betaP_Race_raw ~ normal(0,1);
VOTED_C ~ binomial_logit(DVOTES_C,alpahP + Q_CDData_ast[VData_CDData] * thetaX_CDData_P + to_vector({epsP_Sex,-epsP_Sex}[Sex]) + betaP_Race[Race]);
}
generated quantities {
}