Marginalize over denominators in mixture of binomials

jacobcvt12 · February 11, 2020, 12:54am

I’m trying to implement a model where X (numerator) and N (denominator) are observed, but the data generating process is something like X_1 \sim \text{Binomial}(N_1, \theta_1), X_2 \sim \text{Binomial}(N_2, \theta_2), X = X_1 + X_2, and N_1 \sim \text{Binomial}(N, \tau), N_2 = N - N_1. I only care about \theta_1, \theta_2, \tau so I’m trying to marginalize over the discrete parameters to implement this in Stan. Any advice?

maxbiostat · February 11, 2020, 10:46am

The sum of “non-identical” binomials does not have a closed-form, AFAIK. Since marginalising over all the possible latent states is probably prohibitive, you can try one of the many approximations. I strongly suspect @martinmodrak has more to say on saddle-point approximations.

martinmodrak · February 11, 2020, 4:10pm

Thx @maxbiostat for tagging me. First note the thread at Sum of binomials when only the sum is observed where I discussed a similar topic. It turns out it is important how you choose N_1 and I think that the process you describe is actually identical to this one:

For each element:

Flip a biased coin (with prob \tau) independently
If the coin was heads, choose \hat{\theta} = \theta_1 else choose \hat{\theta} = \theta_2
Flip a biased coin with prob \hat{\theta}, if it is heads, add one to X

If that is so, then X \sim \text{Binomial}(N, \tau \theta_1 + (1-\tau) \theta_2) and you cannot infer any information about \theta_1, \theta_2, \tau individually.

If the process is actually different, the saddlepoint approximation mentioned in the paper linked by @maxbiostat might be sensible.

I wrote about implementing a saddlepoint approximation for sum of negative binomials here: https://www.martinmodrak.cz/2019/06/20/approximate-densities-for-sums-of-variables-negative-binomials-and-saddlepoint/ which discusses all the nuts and bolts to get it running in Stan.

For negative binomials the approximation was not very useful as there are simpler approximations that still work good. Binomials are however different (see the thread I linked earlier). Saddlepoint is however very slow to compute.

Best of luck with your model!

maxbiostat · February 11, 2020, 9:14pm

Incidentally, if you write out the model as stated in the OP, you get E[X]=N(\tau\theta_1 + (1-\tau) \theta_2) by the law of total expectation.

Topic		Replies	Views
Sum of binomials when only the sum is observed Modeling	21	3706	November 10, 2021
Parameter estimation for the sum of negative binomial distributions Modeling count-data	15	1290	March 7, 2021
Saddlepoint approximation for sums of negative binomials in Stan Publicity	1	828	July 3, 2020
Marginalizing a double binomial-Poisson hierarchical distribution Modeling stan-math , ecology	7	877	June 16, 2022
Fit convolution of negative binomial distributions/ Can't define integer in parameter block Modeling techniques , bioinformatics	6	1413	October 8, 2019

Marginalize over denominators in mixture of binomials

Related topics