Participants reporting sum of two distinct types of events

I think I have a similar problem as in Sum of binomials when only the sum is observed, but one that I thought might be more tractable due to there only being two possible combinations of N_1 and N_2 that contribute to the observed sum. I also have a control condition in which only one of these two possible combinations occurred (the number of trials is known, but I only observe their sum).

I study gambling-related harm and decision making. Some gambling events are frequently misinterpreted as gains, when they are in fact net losses, these events are often called “losses disguised as wins”.

In a simple experiment (n > 900), I have two conditions, call these “treatment” and “control”. Both conditions completed a short gambling task that consisted of 10 trials. The control condition always viewed 2 wins, and 8 losses. My treatment condition always viewed 2 wins, 5 losses, and 3 “losses disguised as wins.” Following the task participants were asked to report the number of times a win occurred that was more than the amount bet. I began with a simple model where N = 10 for each condition and estimated p by condition, roughly:

\begin{align*} Response_i &\sim Binomial(10, p_i) \\ logit(p_i) &= \alpha + \beta \times Treatment_i \end{align*}

There is a clear difference between conditions, but unsurprisingly, the model estimates are over dispersed relative to the data.

If 2 wins had occurred, I could model this with Binomial(2, p_{win}) (assuming p_{win} remains the same across trials). Likewise, I could model the probability of reporting a win, when only 8 losses had occurred as Binomial(8, p_{loss}). If I have only observed the sum of these (10 trials), as in the control group, can I estimate p_{win} and p_{loss} using a joint distribution of these two binomials that sets N = 2 and N = 8, or does this remain intractable? If possible how does one write this out in stan?

Of primary interest would be an estimate of the number of participants in the treatment group who were unaware of these events. The data distribution for the treatment condition is clearly bimodal (a peak at 2 and another at 5). I have been able to fit a mixture model (with help from the rethinking package + textbook) that estimates the proportion in the treatment group who responded like the control, using: log_mix(p_mix, binomial_lpdf(Response[i] | 10, p_C), binomial_lpdf(Response[i] | 10, p_T) and structuring the model so that p_mix must equal 1 for the control condition. Model code below.

If I assumed that the treatment group response can be estimated either from the same joint distribution as the control group, or from another where Binomial(5, p_{win}) and Binomial(5, p_{loss}), assuming p_{win} and p_{loss} remain the same across conditions and trials will I be able to extend the model?

As with @Adam_Haber’s case described above, I can set a strong prior that p_{win} > p_{loss}.

This is my first time posting on the Stan forums and I’m still a beginner, so apologies in advance if any of this is a little garbled/confused.

Here is the stan code (extracted + edited from rethinking) for the simple mixture model I have fit so far.

data{
    int Response[940];
    int LDW[940];
}
parameters{
    real b_unaware;
    real alpha;
    real<lower = 0> beta;
}
model{
    vector[940] p_aware;
    vector[940] p_unaware;
    vector[940] p_T;
    real p_C;
    beta ~ normal( 0 , 1 );
    alpha ~ normal( 0 , 1.5 );
    b_unaware ~ normal( 0 , 1.5 );
    p_C = alpha;
    p_C = inv_logit(p_C);
    for ( i in 1:940 ) {
        p_T[i] = alpha + beta * LDW[i];
        p_T[i] = inv_logit(p_T[i]);
    }
    for ( i in 1:940 ) {
        p_unaware[i] = b_unaware * LDW[i];
        p_unaware[i] = inv_logit(p_unaware[i]);
    }
    for ( i in 1:940 ) {
        p_aware[i] = 1 - p_unaware[i];
    }
    for ( i in 1:940 ) target += log_mix(p_aware[i], binomial_lpmf(Response[i] | 10, p_C), binomial_lpmf(Response[i] | 10, p_T[i]));
}
generated quantities{
    vector[940] p_aware;
    vector[940] p_unaware;
    vector[940] p_T;
    real p_C;
    p_C = alpha;
    p_C = inv_logit(p_C);
    for ( i in 1:940 ) {
        p_T[i] = alpha + beta * LDW[i];
        p_T[i] = inv_logit(p_T[i]);
    }
    for ( i in 1:940 ) {
        p_unaware[i] = b_unaware * LDW[i];
        p_unaware[i] = inv_logit(p_unaware[i]);
    }
    for ( i in 1:940 ) {
        p_aware[i] = 1 - p_unaware[i];
    }
}

2 Likes

Hi,
I took the liberty to separate this from the original topic as this is IMHO a quite distinct situation.

I think that you are in luck and your design should indeed allow you to estimate the probabilities of all three types of events being reported as a win. Let’s look at the control condition first.

As long you know how many trials were in each category, you can usually learn something about the probabilities for each category - this is best visible when probabilities of reporting a win and reporting a loss are very different where it leaves a strong pattern in the data, e.g.:

You can already somewhat see in the histogram, that the two “win” trials have to have large probability and the 8 “loss” trials need low probability. I think (I didn’t do all the maths) that as long as the counts of trials are different or you know the ordering of the probabilities for the categories, the model would be idnetified even if the difference in probabilities was small, but large differences definitely provide more information.

Now there is no neat form of this sum distribution, but since the total number of trials is low, you can easily brute force it:

X_{win} \sim Binomial(2, p_{win}) \\ X_{loss} \sim Binomial(8, p_{loss}) \\ P(X_{win} + X_{loss} = k) = \sum_{i = \max\{0, k - 8\}}^{\min\{2,k\}} P(X_{win} = i) P(X_{loss} = k - i)

For the treatment condition, we can’t rely on p_{disguised} to be different from p_{win} (and thus might have trouble learning a lot about it if we observed the treatment alone). However, we can assume p_{win} is the same in both conditions, which should let us learn useful stuff about p_{disguised} from data. Once again, we can bruteforce the distribution of the sum:

X_{win} \sim Binomial(2, p_{win}) \\ X_{disguised} \sim Binomial(3, p_{disguised}) \\ X_{loss2} \sim Binomial(5, p_{loss}) \\ P(X_{win} + X_{loss2} + X_{disguised} = k) = \\ = \sum_{i = \max\{0, k - 8\}}^{\min\{2,k\}} \left( P(X_{win} = i) \sum_{j = \max\{0, k - i - 5\}}^{\min\{3,k - i\}} P(X_{disguised} = j) P(X_{loss2} = k - i - j) \right)

Which should translate to a bunch of for loops in Stan and should be quite fast. You could then fit a mixture for p_{disguised} on top of this. Maybe you could consider one case to be .p_{disguised} = p_{win}, but is is quite likely you’ll also be able to fit a model where one group has a larger p_{disguised} than the other, without any additional restrictions beyond the ordering.

If the number of trials was large and the brute force approach was not feasible, one could still use the saddlepoint approximation - the math for the binomial distribution is worked out at:
[1712.01410] Approximating the Sum of Independent Non-Identical Binomial Random Variables and I show how to implement saddlepoint approximation in Stan at Approximate Densities for Sums of Variables: Negative Binomials and Saddlepoint (unlike the neg. binomial case in the blog, sums of binomial can at least sometimes actually contain information to separate out all the components). I would however strongly suspect that in your case enumerating all possibilities by brute force is going to be faster and easier to implement.

Does that make sense?

4 Likes

Thank you @martinmodrak!

Yes, this absolutely makes sense! This was exactly the model definition I had been grasping for. Thank you!

Based on your response I was able to fit each of these with some stan code (my code was very ugly, it’s a hard problem to code up as a beginner!). The model was able to accurately recover parameters from simulated data, but when fit on the actual data I get more or less the same predictions as a Binomial(10, p_x) model , and the estimates for p_{win} and p_{loss} from the more complex model just don’t seem plausible (too close). The real data likely include some inattentive or random responding, which may be complicating things.

1 Like

I thought I’d post a quick update here that I now seem to have this working using a series of beta-binomial distributions, rather than binomial. Posterior predictive checks now look pretty good and the estimates seem plausible. It’s not a perfect fit, but looks much better than the Binomial(10, p) situation. Presumably the beta-binomial handles individual variation + observations out in the tails a lot better than the binomial (there was likely some satisficing in the observed data). I’m currently writing this up, once I’m done I’ll post a link here to my notes + OSF page in the event that reading through them helps someone fit something similar in the future.

Thanks again for your help with this @martinmodrak, greatly appreciated.

Also in working on this I noticed some errors in the initial mixture model I posted above. The previous code I posted included the control group as well as the treatment (LDW) group in the unmixing (due to a very dumb error on my part). It appears I don’t have permission to edit my own post above (??) so I’ll post a corrected version here:

data{
  int Response[940];
  int LDW[940];
}
parameters{
  real b_aware;
  real alpha;
  real<lower=0> beta;
}
model{
  real p_aware;
  vector[940] p_T;
  real p_C;
  beta ~ normal( 0 , 1 );
  alpha ~ normal( 0 , 1.5 );
  b_aware ~ normal( 0 , 1.5 );
  p_aware = inv_logit(b_aware);
  p_C = inv_logit(alpha);
  for ( i in 1:940 ) {
    p_T[i] = alpha + beta * LDW[i];
    p_T[i] = inv_logit(p_T[i]);
  }
  for ( i in 1:940 ) {
    if (LDW[i] == 0) {
      target += binomial_lpmf(Response[i] | 10, p_C);
    }
    if (LDW[i] == 1) {
      target += log_mix(p_aware, binomial_lpmf(Response[i] | 10, p_C), binomial_lpmf(Response[i] | 10, p_T[i]));
    }
  }
}

or rethinking code like this:

 ulam(
  alist(
    Est|LDW == 1 ~ custom(log_mix(p_aware, binomial_lpmf(Est|10, p_C), binomial_lpmf(Est|10, p_T))),
    Est|LDW == 0 ~ custom(binomial_lpmf(Est | 10, p_C)),
    logit(p_aware) <- b_aware,
    logit(p_T)   <- alpha + beta*LDW,
    logit(p_C)   <- alpha,
    b_aware ~ dnorm(0, 1.5),
    alpha ~ dnorm(0, 1.5),
    beta  ~ dnorm(0, 1)
  ), data = dt, cores = 4, chains = 4, iter = 3500, warmup = 1000,
  constraints = list(beta = "lower = 0"))
4 Likes