Two sources of data for one model?

I’m trying to model likelihood to vote and likelihood to vote for a particular candidate, each as a binomial, given the result of an election in several counties with varying demographics. That much is straightforward. But I’d actually like to try to model 2 elections with the same parameters, so what I tried was:

data {
int<lower = 1> G; // number of counties
  int<lower = 1> K; // number of predictors
  int<lower = 1, upper = G> county; // do we need this?
  matrix[G, K] X;
  int<lower = 0> VAP[G];
  int<lower = 0> DVotes1[G];
  int<lower = 0> DVotes2[G];
  int<lower = 0> TVotes1[G];
  int<lower = 0> TVotes2[G];
}
parameters {
real alphaD;                             
  vector[K] betaV;
  real alphaV;
  vector[K] betaD;
}
model {
alphaD ~ normal(0, 2);
  alphaV ~ normal(0, 2);
  betaV ~ normal(0, 1);
  betaD ~ normal(0, 1);
  TVotes1 ~ binomial_logit(VAP, alphaV + X * betaV);
  TVotes2 ~ binomial_logit(VAP, alphaV + X * betaV);
  DVotes1 ~ binomial_logit(TVotes1, alphaD + X * betaD);
  DVotes2 ~ binomial_logit(TVotes2, alphaD + X * betaD);
}
generated quantities {
vector<lower = 0, upper = 1>[G] pVotedP;
  vector<lower = 0, upper = 1>[G] pDVoteP;
  pVotedP = inv_logit(alphaV + (X * betaV));
  pDVoteP = inv_logit(alphaD + (X * betaD));
}

This led to an error:
"
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
"

Commenting out the lines in the model block referring to the 2nd election (beginning with “TVotes2” and “DVotes2”) fixes it and then the sampler runs and I get a reasonable answer.
I could imagine addressing this by “stacking” the data into one longer array and then having it as one model, but that would involve some annoying index shuffling, etc. I’m wondering if there’s a way to do it directly?
Thanks!

Nevermind! This had something to do with bad priors rather than the two sets of data.

1 Like

Regardless of your priors, this is not unexpected. As you increase the parameter space to be explored by MCMC it is likely that there will be larger regions of low and zero probability; all the message is saying is that the initial guess is at one of those values.

Assuming there is nothing wrong with the model/priors, one way to get around it is providing initial parameter values that you know have nonzero probability – but preferably making them so that multiple chains still start at points different enough that you can confidently assess convergence.