Two sources of data for one model?

adamConnerSax · November 13, 2020, 9:20pm

I’m trying to model likelihood to vote and likelihood to vote for a particular candidate, each as a binomial, given the result of an election in several counties with varying demographics. That much is straightforward. But I’d actually like to try to model 2 elections with the same parameters, so what I tried was:

data {
int<lower = 1> G; // number of counties
  int<lower = 1> K; // number of predictors
  int<lower = 1, upper = G> county; // do we need this?
  matrix[G, K] X;
  int<lower = 0> VAP[G];
  int<lower = 0> DVotes1[G];
  int<lower = 0> DVotes2[G];
  int<lower = 0> TVotes1[G];
  int<lower = 0> TVotes2[G];
}
parameters {
real alphaD;                             
  vector[K] betaV;
  real alphaV;
  vector[K] betaD;
}
model {
alphaD ~ normal(0, 2);
  alphaV ~ normal(0, 2);
  betaV ~ normal(0, 1);
  betaD ~ normal(0, 1);
  TVotes1 ~ binomial_logit(VAP, alphaV + X * betaV);
  TVotes2 ~ binomial_logit(VAP, alphaV + X * betaV);
  DVotes1 ~ binomial_logit(TVotes1, alphaD + X * betaD);
  DVotes2 ~ binomial_logit(TVotes2, alphaD + X * betaD);
}
generated quantities {
vector<lower = 0, upper = 1>[G] pVotedP;
  vector<lower = 0, upper = 1>[G] pDVoteP;
  pVotedP = inv_logit(alphaV + (X * betaV));
  pDVoteP = inv_logit(alphaD + (X * betaD));
}

This led to an error:
"
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
"

Commenting out the lines in the model block referring to the 2nd election (beginning with “TVotes2” and “DVotes2”) fixes it and then the sampler runs and I get a reasonable answer.
I could imagine addressing this by “stacking” the data into one longer array and then having it as one model, but that would involve some annoying index shuffling, etc. I’m wondering if there’s a way to do it directly?
Thanks!

adamConnerSax · November 15, 2020, 4:01am

Nevermind! This had something to do with bad priors rather than the two sets of data.

caesoma · November 15, 2020, 7:38pm

Regardless of your priors, this is not unexpected. As you increase the parameter space to be explored by MCMC it is likely that there will be larger regions of low and zero probability; all the message is saying is that the initial guess is at one of those values.

Assuming there is nothing wrong with the model/priors, one way to get around it is providing initial parameter values that you know have nonzero probability – but preferably making them so that multiple chains still start at points different enough that you can confidently assess convergence.

Topic		Replies	Views
Combining (binomial) models Modeling	8	463	August 30, 2021
Beta-binomial with 2 data sources General techniques	7	669	January 5, 2021
Is one model using binomial equivalent to the other that uses bernoulli? Modeling	2	303	April 28, 2023
Multivariate prior for hierarchical model; missing something? Modeling fitting-issues	18	622	January 25, 2021
Continuous Binomial: further questions Modeling	3	423	September 18, 2023

Two sources of data for one model?

Related topics