Combining (binomial) models

adamConnerSax · August 19, 2021, 1:14am

I’m trying to model voter turnout and voter preference. I’ve got two sets of data, each with the same predictors (geography, population density, age, sex, race), one (CPS) with turnout data which I use to model the probability of a particular type of person turning out to vote, and the other with voting intention data (CCES) which I use to model the probability of a particular type of person voting for a particular candidate.

The result I am interested in is a post-stratification of the product of those probabilities across the population data for a particular region.

I could model them separately, but then, I think, I lose the ability to produce confidence intervals. So I’m trying to model them together (see Stan code below). The combined model has 2 sets of parameters, one for turnout and one for preference (vote-intention) and two binomial sampling statements.

One simple question: is this more or less the right way to do this?

Since I only care about the product, is there some simpler sampling statement that I could use? I ask that because the model runs very slowly and has a lot of divergences (2-3%). My comparison for speed is just modeling either probability separately which runs about 100 times faster.

The divergences are something I could try to tackle by re-parameterization and increasing adapt-delta and such. But, because it’s so slow, that will be tricky. So I’d appreciate any pointers on doing this or to know if I’m better off modeling things separately and accepting less information about my uncertainties.

Thanks!

Code follows.

data {
  int<lower=2> J_I_CDData;
  int<lower=1> N_CDData;
  int<lower=1> I_CDData[N_CDData];
  int<lower=2> J_CD;
  int<lower=1> N_VData;
  int<lower=1> CD[N_VData];
  int<lower=2> J_State;
  int<lower=1> State[N_VData];
  int<lower=2> J_Sex;
  int<lower=1> Sex[N_VData];
  int<lower=2> J_Race;
  int<lower=1> Race[N_VData];
  int<lower=2> J_Education;
  int<lower=1> Education[N_VData];
  int<lower=1> VData_CDData[N_VData];
  int K_X_CDData;
  matrix[N_CDData,K_X_CDData] X_CDData;
  int<lower=0> CVAP[N_VData];
  int<lower=0> VOTED[N_VData];
  int<lower=0> VOTED_C[N_VData];
  int<lower=0> DVOTES_C[N_VData];
}
transformed data {
  vector[K_X_CDData] mean_X_CDData;
  matrix[N_CDData,K_X_CDData] centered_X_CDData;
  for (k in 1:K_X_CDData) {
    mean_X_CDData[k] = mean(X_CDData[,k]);
    centered_X_CDData[,k] = X_CDData[,k] - mean_X_CDData[k];
  }
  matrix[N_CDData,K_X_CDData] Q_CDData_ast = qr_thin_Q(centered_X_CDData) * sqrt(N_CDData - 1);
  matrix[K_X_CDData,K_X_CDData] R_CDData_ast = qr_thin_R(centered_X_CDData) / sqrt(N_CDData - 1);
  matrix[K_X_CDData,K_X_CDData] R_CDData_ast_inverse = inverse(R_CDData_ast);
}
parameters {
  real alphaT;
  vector[K_X_CDData] thetaX_CDData_T;
  real epsT_Sex;
  real epsT_Education;
  real<lower=0> sigmaT_Race;
  vector[J_Race] betaT_Race_raw;
  real alpahP;
  vector[K_X_CDData] thetaX_CDData_P;
  real epsP_Sex;
  real epsP_Education;
  real<lower=0> sigmaP_Race;
  vector[J_Race] betaP_Race_raw;
}
transformed parameters {
  vector[K_X_CDData] betaX_CDData_T;
  betaX_CDData_T = R_CDData_ast_inverse * thetaX_CDData_T;
  vector[J_Race] betaT_Race = sigmaT_Race * betaT_Race_raw;
  vector[K_X_CDData] betaX_CDData_P;
  betaX_CDData_P = R_CDData_ast_inverse * thetaX_CDData_P;
  vector[J_Race] betaP_Race = sigmaP_Race * betaP_Race_raw;
}
model {
  alphaT ~ normal(0,2);
  thetaX_CDData_T ~ normal(0,2);
  epsT_Sex ~ normal(0,2);
  epsT_Education ~ normal(0,2);
  sigmaT_Race ~ normal(0,2);
  betaT_Race_raw ~ normal(0,1);
  VOTED ~ binomial_logit(CVAP,alphaT + Q_CDData_ast[VData_CDData] * thetaX_CDData_T + to_vector({epsT_Sex,-epsT_Sex}[Sex]) + betaT_Race[Race]);
  alpahP ~ normal(0,2);
  thetaX_CDData_P ~ normal(0,2);
  epsP_Sex ~ normal(0,2);
  epsP_Education ~ normal(0,2);
  sigmaP_Race ~ normal(0,2);
  betaP_Race_raw ~ normal(0,1);
  VOTED_C ~ binomial_logit(DVOTES_C,alpahP + Q_CDData_ast[VData_CDData] * thetaX_CDData_P + to_vector({epsP_Sex,-epsP_Sex}[Sex]) + betaP_Race[Race]);
}
generated quantities {


}

mike-lawrence · August 19, 2021, 4:18am

You’re doing a non-centered parameterization of your beta*_Race variables, but I wonder if you have a sufficient volume of data to make it work better to use a centered parameterization?

adamConnerSax · August 19, 2021, 3:06pm

Thanks for the reply! I am now trying centered (as I understand it, code below, just to verify). It’s still slow, but maybe faster than before? Hard to tell. I’ll reply again in a few hours when it’s done about divergences, etc.

In the meantime, if you would, could you explain (or point me to a reference so I can better understand) the sentence “sufficient volume of data to make it work better using a centered parameterization.” I assume “volume of data” just means something like number of items in each category? But why does the amount of data indicate which parameterization might be best?

I know the difference between the parameterizations but am pretty confused about how to figure out which might be better suited to what situation.

I’m new(ish) to Stan and I’m struggling with the divergences bit. I very much appreciate that divergences are information and that, beyond increasing adapt-delta, there isn’t, and shouldn’t be, a cookie-cutter procedure to “fix” them. What’s hard, though, is that I’ve yet to figure out how to, e.g., go from @betanalpha’s amazing blog posts and then apply that to a different model, especially one that runs slowly so each thing I try takes a long time.

I can eliminate terms in the model until I see where the issue is. In this case, as you intuited, it’s the beta*_Race terms. It’s much faster without them. And there are strong correlations between beta*_Race[i] and alpha*. Which makes sense since, the betas are not forced to sum to 0 and there are only 4 race categories. Perhaps complicating this further is that there are more people self-identifying as “white” than any other race so I think there’s a non-identification issue in particular between that element of beta*_Race and alpha*. I’ve tried adding sum-to-zero constraints on the betas but that doesn’t seem to help, perhaps because they are coefficients for such unequally weighted groups.

Sometimes I can reduce the amount of data to make things faster. That would work here until I add a term for state (or Congressional district) level effects. But is that a valid way to figure these things out or will reducing the data create or obscure issues?

I could also run with many fewer iterations. But that also seems like it might create or obscure real issues.

In general, I get a little lost in the sea of these possibilities and just start trying things more or less at random. Which is clearly not a good way to improve the model or my understanding of how to build models like this successfully.

So I guess in addition to concrete suggestions for this model right now (thanks again, @mike-lawrence !), I’d love good advice/pointers to posts about a process I can use to work through, develop intuition about and understand these issues better.

And I so appreciate this forum’s continued patience and help with my questions and so many divergence related questions!

Thanks.

Adam

data {
  int<lower=2> J_I_CDData;
  int<lower=1> N_CDData;
  int<lower=1> I_CDData[N_CDData];
  int<lower=2> J_CD;
  int<lower=1> N_VData;
  int<lower=1> CD[N_VData];
  int<lower=2> J_State;
  int<lower=1> State[N_VData];
  int<lower=2> J_Sex;
  int<lower=1> Sex[N_VData];
  int<lower=2> J_Age;
  int<lower=1> Age[N_VData];
  int<lower=2> J_Race;
  int<lower=1> Race[N_VData];
  int<lower=2> J_Education;
  int<lower=1> Education[N_VData];
  int<lower=1> VData_CDData[N_VData];
  int K_X_CDData;
  matrix[N_CDData,K_X_CDData] X_CDData;
  int<lower=0> CVAP[N_VData];
  int<lower=0> VOTED[N_VData];
  int<lower=0> VOTED_C[N_VData];
  int<lower=0> DVOTES_C[N_VData];
}
transformed data {
  vector[K_X_CDData] mean_X_CDData;
  matrix[N_CDData,K_X_CDData] centered_X_CDData;
  for (k in 1:K_X_CDData) {
    mean_X_CDData[k] = mean(X_CDData[,k]);
    centered_X_CDData[,k] = X_CDData[,k] - mean_X_CDData[k];
  }
  matrix[N_CDData,K_X_CDData] Q_CDData_ast = qr_thin_Q(centered_X_CDData) * sqrt(N_CDData - 1);
  matrix[K_X_CDData,K_X_CDData] R_CDData_ast = qr_thin_R(centered_X_CDData) / sqrt(N_CDData - 1);
  matrix[K_X_CDData,K_X_CDData] R_CDData_ast_inverse = inverse(R_CDData_ast);
}
parameters {
  real alphaT;
  vector[K_X_CDData] thetaX_CDData_T;
  real epsT_Sex;
  real<lower=0> sigmaT_Race;
  vector[J_Race] betaT_Race;
  real alphaP;
  vector[K_X_CDData] thetaX_CDData_P;
  real epsP_Sex;
  real<lower=0> sigmaP_Race;
  vector[J_Race] betaP_Race;
}
transformed parameters {
  vector[K_X_CDData] betaX_CDData_T;
  betaX_CDData_T = R_CDData_ast_inverse * thetaX_CDData_T;
  vector[K_X_CDData] betaX_CDData_P;
  betaX_CDData_P = R_CDData_ast_inverse * thetaX_CDData_P;
}
model {
  alphaT ~ normal(0,2);
  thetaX_CDData_T ~ normal(0,2);
  epsT_Sex ~ normal(0,2);
  sigmaT_Race ~ normal(0,2);
  betaT_Race ~ normal(0,sigmaT_Race);
  VOTED ~ binomial_logit(CVAP,alphaT + Q_CDData_ast[VData_CDData] * thetaX_CDData_T + to_vector({epsT_Sex,-epsT_Sex}[Sex]) + betaT_Race[Race]);
  alphaP ~ normal(0,2);
  thetaX_CDData_P ~ normal(0,2);
  epsP_Sex ~ normal(0,2);
  sigmaP_Race ~ normal(0,2);
  betaP_Race ~ normal(0,sigmaP_Race);
  DVOTES_C ~ binomial_logit(VOTED_C,alphaP + Q_CDData_ast[VData_CDData] * thetaX_CDData_P + to_vector({epsP_Sex,-epsP_Sex}[Sex]) + betaP_Race[Race]);
}

mike-lawrence · August 19, 2021, 3:37pm

See here.

adamConnerSax · August 19, 2021, 8:46pm

Thanks! I’ve read that before, but it makes more sense each time and I think I see how to move forward a bit here.

adamConnerSax · August 21, 2021, 1:49am

@mike-lawrence You were right about all the things. Centered version took a long time but ran without divergences, at least for 4x2000 iterations.

After another couple re-readings of parts of the @betanalpha piece, I was able to actually locate the funnel degeneracy–still present in the centered version–which was totally predictable since I’m pretty clearly in the “not enough contexts” situation. That was quite pleasing since when I’ve had divergences in the past, I had trouble locating the funnel (or whatever) causing them. So this feels like progress!

I still have a non-identification issue. Which totally makes sense since a constant shift in all the beta* coefficients is the same as a shift in alpha*. But that’s for tomorrow…

Thanks for all the help!

betanalpha · August 23, 2021, 7:06pm

Right. As I discuss in Identity Crisis idiosyncratic models can have idiosyncratic degeneracies and in general there’s not much that can be done other than “look everywhere and hope something manifest in 2 dimensional projections” or “work through the mathematical structure of the model”, both of which can be overwhelming. This is one reason why starting with a smaller model and iterating is often more productive as it spreads out the diagnostics into more manageable chunks. So also the investigation strategies in Identity Crisis.

The most productive route, however, is to study the modeling techniques that you’re incorporating into your model so that you are prepared for the typical degeneracies and, perhaps more importantly, in which variables they manifest so that you can prioritize where to look. For hierarchal modeling we want to keep an eye out on the number of contexts and the data in each context ahead of time and then investigate the theta/tau and mu/tau pairs plots after the fit.

Keep in mind also that one centers or non-centers each context separately and if you have imbalanced data then monolithically centering or non-centering all of the contexts can both be suboptimal.

This shouldn’t be too bad so long as you have a reasonable amount of data in each context. See for example https://betanalpha.github.io/assets/case_studies/factor_modeling.html#35_Residual_Degeneracies.

adamConnerSax · August 24, 2021, 2:57pm

Thanks for the help!
I am figuring it out slowly, mostly by failing to follow good advice until I have learned the hard way, particularly about keeping the model simple and then adding to see where things get complicated.

And I will try to focus on the techniques rather than just jumping in.

I can now at least see exactly which combination of predictors leads to identification issues–some of which Stan can handle but at relatively high computational cost.

@betanalpha, your blog posts are immensely helpful. I do find them slow-going, because you pack so much in. Each time I read a bit, and only a small fraction lands, then I read it again, eager to solve a different problem and I get a bit more.

Anyway, thanks for putting so much useful stuff out there and for tuning in here to point the way!

Adam

betanalpha · August 30, 2021, 4:36pm

Thanks!

Topic		Replies	Views
Two sources of data for one model? Modeling	2	380	November 15, 2020
Combining data-sets Modeling techniques	4	550	February 2, 2022
Beta-binomial with 2 data sources General techniques	7	657	January 5, 2021
Continuous Binomial: further questions Modeling	3	388	September 18, 2023
Dispersion and binomial vs beta-binomial Modeling	6	920	January 4, 2023

Combining (binomial) models

Related Topics