Binomial trials vary as a poisson

I am having trouble figuring out how to implement a model where the number of trials in the binomial are drawn from a poisson.

Model:

data {
    int n;
    int I[n]
    int V[n];
}
parameter {
    real lambda;
    real p;
    int I_ref; //This does not work (ints are not allowed)
model { 
    I ~ poisson(lambda)
    I_ref ~ poisson(lambda)
    V ~ binomial(I_ref, p)
}

Is there a way to take I_ref from the poisson(lambda)?

if Y \sim \text{Binomial}(I, p) and I\sim \text{Poisson}(\lambda) , then you can marginalize over I and get Y \sim \text{Poisson}(p \lambda). In my experience, Stan has trouble identifying p vs \lambda without strong priors and/or predictors.

3 Likes

Great. Thanks! I was thinking of using something like this but wanted to check with the community first to make sure it was valid.

So when I changed the model to:

data {
   int n;
   int I[n]
   int V[n];
}
parameters {
   real lambda;
   real p;
}
model { 
   I ~ poisson(lambda)
   V ~ poisson(lambda*p)
}

I found that lambda was struggling to explain both I and V. Is there any way to make a two step model where I can prevent the V poisson from effecting the lambda posterior?

I think that means changing V ~ poisson(lambda*p) to target += ... which somehow doesn’t contain lambda.

Maybe this is not possible.

Here are two suggestions:

  • I think you can remove the Poisson distribution for I from the model block. The distribution of V implies that I has that distribution so estimating p and lambda can be done just from V (although like @Christopher-Peterson said, you will probably need some priors for the two parameters).

  • You definitely need to add constraints <lower=0> for lambda and <lower=0,upper=1> for p in the parameters block to properly define the parameter space for sampling.

Is this the extent of your data? If you really observe both I and V as data then your model should work fine. Do you have a reproducible example that shows it failing? I think you might be missing a detail about how this works.

Yeah that’s true, it should be ok if that’s the case, but the constraints in the parameters block were missing so that could be part of the issue (along with priors).

Thanks for the responses.

There is a lot more to the model I am working on (dimensions and complexity), so I tried to boil it down to the smallest example to explain the question I was asking. I have constraints and priors which seem to be working well.

Basically, I have two models that both work well independently, one for \lambda based on I, one for p based on V (where I use I in the binomial)
But I want to link them together.

Two independent models

data {
   int n; //number of subjects
   int v; //number of test types
   int I[n] //number of times tested
   int V[v,n]; //number of success 
   int E[n]; //Exposure to tests
}
parameters {
   real lambda<lower=0>[n];  //Rate of tests
   vector<lower=0>[v] alpha; //slope
   vector[v] beta; // intercept
   vector[n] theta; //subject latent score
}
model { 
   I ~ poisson(lambda .* E);
   for (k in 1:v) {
      V[k] ~ binomial_logit(I, alpha * theta + beta);
   }
}

I linked them with this model:

model { 
   I ~ poisson(lambda .* E);
   for (k in 1:v) {
      V[k] ~ poisson(lambda .* E .* (alpha[k] * theta + beta[k]));
   }
}

The failure occurred when changing the cohort of subjects (25% overlapped), \lambda inverted, i.e. a subject with the highest \lambda when one cohort had the lowest with another, and vice versa. My interpretation is that \lambda was being pulled to hard by V.

Maybe removing the dependence on I is the write move, but it seems like its ignoring part of the data.

If you want to multiply lambda by p from the binomial logit model then you would need to make it inv_logit(alpha[k] * theta + beta[k]). Does that yield more sensible results?

Oops. You are correct. That was a typo in the post. The model uses the inv logit

I’m not sure. It does seem reasonable to merge them. Maybe this is part of the problem (or at least not helping):

If p = inv_logit(alpha * theta + beta) and alpha, theta, and beta are all parameters then you have no data in the expression for p and you’re going to have trouble estimating that many parameters (v + n + n given the dimensions declared) without any predictor variables.

On another note, do you have any code for simulating data and fitting the model (or simplified version)? That would make it easier for everyone to really see the issue and try working on it, and you’ll probably get more responses that way.