Bayesian Beta Binomial model with latent draw

I am trying to model the following process using a Bayesian model.
I would like to understand if my formulation makes sense, and what are the next steps to tackle it with Stan.

Let us suppose that there exist 3 populations k of coins in i ordered urns.
Each urn contains a large number of coins, and in each urn, the fraction of the 3 populations is the same.

Each population k is characterized by its own rate of success specific to urn i, which is known in the form of Beta(\alpha_{ik},\beta_{ik}).

Now the generative process is the following:

  • For each urn i, n_i coins are drawn and tossed. The number of successes m_i and the number of tosses n_i for each urn are observed and together form a sample.

I would like to estimate the fraction of the 3 populations in a sample, given those two observed vectors, for a sufficient number of urns.

My intuition is that given some extreme probability values (selected a priori), the tosses will bring information about the population that generated them.

For example, if in urn 1 the population A has a very peaked Beta distribution near one, and B and C are peaked near 0, then observing 40/100 successes would likely suggest that something around 40% of the coins belong to population A (leaving uncertainty for the others, and that’s why we need many urns, with different characteristics).

I tried to model the process as follows (with 3 populations):

m_i \sim \sum_{k}^{3} BetaBin(n_{k},\alpha_{ik},\beta_{ik}) total number of successes for an urn

\mathbf{n_i} \sim DirMult(n_{i},\mathbf{c}) the total draws are split based on c, which lives on a simplex

Setting c to (1,1,1) as non informative prior.

I want to estimate c, the fractions.

First question: does this formulation make sense? To me, it seems the most faithful to the process.

Secondly, I have been trying to implement this in RStan, but the latent draw \mathbf{n_i} are integers and not admitted, so should be marginalized out. Even though I understand marginalization in simpler cases, I really can’t figure this out in my current formulation, given that \mathbf{n_i} is a vector of combinations rather than a single value.

Is there any similar problem in the literature, or any resource that can guide me through this marginalization?
From my understanding, here I should marginalize over all possible combinations that make up \mathbf{n_i}, and to me, this is not straight forward as summing a latent index or count.

I apologize if any of my notation is not clear, as I discovered Bayesian models only recently and I am still quite a beginner. I have a draft for the Stan code to achieve this, but I would like to be sure about the correctness of the model before diving into the implementation.

Thank you for your time!

1 Like

Maybe. Why you don’t choose a trinomial distribution?

Then you can use:

real dirichlet_multinomial_lpmf(int[] y, vector alpha) {   
  real alpha_plus = sum(alpha);      
  return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y))) - 
    lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));  

To me it doesn’t makes sense to have a Beta (binomial) and Dirichlet distribution combined together in your context. What do you think?

Thanks for pointing out the trinomial, this is a new distribution to me.
However, i think it would be nice to have a formulations which generalizes over the number of given population.
Concerning the fact that I combine Beta and Multinomial I understand why this sounds obscure. Usually, you either employ DirMult or BetaBin to simulate some draws that have a probability of success which is in turn determined by another distribution. On the other hand, in my problem the DirMult is essentially only the first of 2 steps, with the second being that the draws themselves can be either success/failure.

Indeed, I did some progress on the model formulation and I will comment about that below.


After getting stuck with the latent count marginalization I operated the following simplification: instead of having multinomial draws, I mix 3 beta distribution using the latent fractions c. By doing so I am computing a weighted average and obtaining a probability of success which, for each urn, is proportional to the Betas that characterize the three groups and their fraction of contribution. Some very simple testing on sythnetic examples worked out as expected, I have only some doubts concerning a possible underestimation of the variance due to this adaptation. I will update with further observation whether the picture will be clear.

Why 3 beta distributions? I’d assume you need 2, since p1 + p2 + p3 = 1 <=> p1 = 1 - p2 + p3.

Sorry for my unprecise wording: the 3 Beta distributions I was referring to are the ones that characterize the probability of success in each single population, not the distribution defining their contribution.

But maybe I am really overcomplicating the problem or missing something. I try to reharse the process:

  • In the original population, there are 20% A, 30% of B and 50% of C. Let’s focus on a single urn.
  • First, you draw 20. This is a Mult(20 * (.2, .3,.5)), right? You get a vector of 3 counts that sums to 20.
  • Then, for each of the “count split” you have a probability of success that follows a known Beta distribution.
  • Let’s say that 12 draws come from A, then m_a \sim Bin(12, \theta) and \theta \sim Beta_a(\alpha_a, \beta_a) (same goes for B and C)

I am not familiar with the Generalized Dririchlet, at first glance I would probably need some time to get the idea.

Let’s say that 12 draws come from A, 3 draws come from B, then C is determined to 5 draws.

Thus I’d say you have 2 Beta distributions (= Generalized Dirichlet) and have an Multinomial Distribution reflecting your urn size. You are free to construct a_i, b_i of Beta(a_i, b_i) from a Dirichlet Distribution.