I think I have a similar problem as in Sum of binomials when only the sum is observed, but one that I thought might be more tractable due to there only being two possible combinations of N_1 and N_2 that contribute to the observed sum. I also have a control condition in which only one of these two possible combinations occurred (the number of trials is known, but I only observe their sum).
I study gambling-related harm and decision making. Some gambling events are frequently misinterpreted as gains, when they are in fact net losses, these events are often called “losses disguised as wins”.
In a simple experiment (n > 900), I have two conditions, call these “treatment” and “control”. Both conditions completed a short gambling task that consisted of 10 trials. The control condition always viewed 2 wins, and 8 losses. My treatment condition always viewed 2 wins, 5 losses, and 3 “losses disguised as wins.” Following the task participants were asked to report the number of times a win occurred that was more than the amount bet. I began with a simple model where N = 10 for each condition and estimated p by condition, roughly:
There is a clear difference between conditions, but unsurprisingly, the model estimates are over dispersed relative to the data.
If 2 wins had occurred, I could model this with Binomial(2, p_{win}) (assuming p_{win} remains the same across trials). Likewise, I could model the probability of reporting a win, when only 8 losses had occurred as Binomial(8, p_{loss}). If I have only observed the sum of these (10 trials), as in the control group, can I estimate p_{win} and p_{loss} using a joint distribution of these two binomials that sets N = 2 and N = 8, or does this remain intractable? If possible how does one write this out in stan?
Of primary interest would be an estimate of the number of participants in the treatment group who were unaware of these events. The data distribution for the treatment condition is clearly bimodal (a peak at 2 and another at 5). I have been able to fit a mixture model (with help from the rethinking
package + textbook) that estimates the proportion in the treatment group who responded like the control, using: log_mix(p_mix, binomial_lpdf(Response[i] | 10, p_C), binomial_lpdf(Response[i] | 10, p_T)
and structuring the model so that p_mix
must equal 1 for the control condition. Model code below.
If I assumed that the treatment group response can be estimated either from the same joint distribution as the control group, or from another where Binomial(5, p_{win}) and Binomial(5, p_{loss}), assuming p_{win} and p_{loss} remain the same across conditions and trials will I be able to extend the model?
As with @Adam_Haber’s case described above, I can set a strong prior that p_{win} > p_{loss}.
This is my first time posting on the Stan forums and I’m still a beginner, so apologies in advance if any of this is a little garbled/confused.
Here is the stan code (extracted + edited from rethinking
) for the simple mixture model I have fit so far.
data{
int Response[940];
int LDW[940];
}
parameters{
real b_unaware;
real alpha;
real<lower = 0> beta;
}
model{
vector[940] p_aware;
vector[940] p_unaware;
vector[940] p_T;
real p_C;
beta ~ normal( 0 , 1 );
alpha ~ normal( 0 , 1.5 );
b_unaware ~ normal( 0 , 1.5 );
p_C = alpha;
p_C = inv_logit(p_C);
for ( i in 1:940 ) {
p_T[i] = alpha + beta * LDW[i];
p_T[i] = inv_logit(p_T[i]);
}
for ( i in 1:940 ) {
p_unaware[i] = b_unaware * LDW[i];
p_unaware[i] = inv_logit(p_unaware[i]);
}
for ( i in 1:940 ) {
p_aware[i] = 1 - p_unaware[i];
}
for ( i in 1:940 ) target += log_mix(p_aware[i], binomial_lpmf(Response[i] | 10, p_C), binomial_lpmf(Response[i] | 10, p_T[i]));
}
generated quantities{
vector[940] p_aware;
vector[940] p_unaware;
vector[940] p_T;
real p_C;
p_C = alpha;
p_C = inv_logit(p_C);
for ( i in 1:940 ) {
p_T[i] = alpha + beta * LDW[i];
p_T[i] = inv_logit(p_T[i]);
}
for ( i in 1:940 ) {
p_unaware[i] = b_unaware * LDW[i];
p_unaware[i] = inv_logit(p_unaware[i]);
}
for ( i in 1:940 ) {
p_aware[i] = 1 - p_unaware[i];
}
}