I want to fit a model involving a binomial-Poisson hiearchical distribution with two sets of conditionally independent binomial counts:
y_i \sim \text{Poisson}(\lambda_i),
z_{1, i} \sim \text{binomial}(y_i, \kappa),
z_{2, i} \sim \text{binomial}(y_i, \kappa),
where z_{1, i} and z_{2,i} are conditionally independent given y_i, y_i is unobserved, and the \lambda_i come from a regression with a log link (and Gaussian overdispersion).
If I had only a single z_i, it would be easy to marginalize: z_i \sim \text{Poisson}(\kappa \lambda_i).
Approaching marginalizing with two sets of counts in the same way, and with a bit of help from SageMath, I found that the joint marginal density of (z_{1,i}, z_{2, i}) is
P(Z_{1, i} = z_{1, i}, Z_{2, i} = z_{2, i}) = \kappa^{z_{1,i} + z_{2, i}}e^{-\lambda_i}(1 - \kappa)^{-(z_{1, i} + z_{2, i})}a^{z_{1, i}}\binom{z_{1, i}}{z_{2, i}}\frac{1}{z_{1, i}!} {_1F_1}(z_{1, i} + 1, z_{1, i} - z_{2, i} + 1, a),
where a = (1 - \kappa)^2 \lambda_i, I assumed without loss of generality that z_{1, i} \geq z_{2, i}, and _1F_1() is a generalized hypergeometric function.
Can I implement this easily in Stan? There might be some obvious simplification that I haven’t recognized. If not, I haven’t found a generalized hypergeometric function in the documentation, but it doesn’t look too difficult to write a Stan function based on genhypergeo() in the R package hypergeo (although I can imagine there is potential for numerical problems).
I’d be quite happy to have y_i as a latent variable instead, but I couldn’t immediately see how to do this in Stan: it can’t be a parameter because it’s discrete. I don’t need to be able to identify \kappa and \lambda_i separately, although looking at the formula suggests that this will be possible.