I’m trying to implement a model where X (numerator) and N (denominator) are observed, but the data generating process is something like X_1 \sim \text{Binomial}(N_1, \theta_1), X_2 \sim \text{Binomial}(N_2, \theta_2), X = X_1 + X_2, and N_1 \sim \text{Binomial}(N, \tau), N_2 = N - N_1. I only care about \theta_1, \theta_2, \tau so I’m trying to marginalize over the discrete parameters to implement this in Stan. Any advice?

# Marginalize over denominators in mixture of binomials

The sum of “non-identical” binomials does not have a closed-form, AFAIK. Since marginalising over all the possible latent states is probably prohibitive, you can try one of the many approximations. I strongly suspect @martinmodrak has more to say on saddle-point approximations.

Thx @maxbiostat for tagging me. First note the thread at Sum of binomials when only the sum is observed where I discussed a similar topic. It turns out it is important how you choose N_1 and I *think* that the process you describe is actually identical to this one:

For each element:

- Flip a biased coin (with prob \tau) independently
- If the coin was heads, choose \hat{\theta} = \theta_1 else choose \hat{\theta} = \theta_2
- Flip a biased coin with prob \hat{\theta}, if it is heads, add one to X

If that is so, then X \sim \text{Binomial}(N, \tau \theta_1 + (1-\tau) \theta_2) and you cannot infer any information about \theta_1, \theta_2, \tau individually.

If the process is actually different, the saddlepoint approximation mentioned in the paper linked by @maxbiostat might be sensible.

I wrote about implementing a saddlepoint approximation for sum of negative binomials here: https://www.martinmodrak.cz/2019/06/20/approximate-densities-for-sums-of-variables-negative-binomials-and-saddlepoint/ which discusses all the nuts and bolts to get it running in Stan.

For negative binomials the approximation was not very useful as there are simpler approximations that still work good. Binomials are however different (see the thread I linked earlier). Saddlepoint is however very slow to compute.

Best of luck with your model!

Incidentally, if you write out the model as stated in the OP, you get E[X]=N(\tau\theta_1 + (1-\tau) \theta_2) by the law of total expectation.