Background
I want to model a scenario which gives rise to biased coins. Say I pull a coin out of a bag which contains a mix of biased and fair coins. The proportion of fair to biased is unknown to me. The bias of biased coins is allowed to vary, and the average bias is again unknown to me.
Say I draw 100 coins and flip them each 100 times. I’d like to hierarchically model this data and I think the likelihood is a binomial mixture.
Here is my shot at a model. In the model, the probability of the coin being biased has a uniform probability, and the probability of heads for biased coins is drawn from a beta distribution.
data{
int n;
int heads[n];
}
parameters{
real<lower=0, upper=1> prob_of_bias;
vector<lower = 0, upper = 1>[n] b;
real<lower=0, upper=1> mu;
real<lower=0> kappa;
}
model{
prob_of_bias ~ beta(1,1);
mu ~ beta(1,1);
kappa ~ cauchy(0, 1);
b ~ beta_proportion(mu,kappa);
for (i in 1:n){
target+=log_mix(prob_of_bias,
binomial_lpmf(heads[i] | 100, b[i]),
binomial_lpmf(heads[i] | 100, 0.5));
}
}
Here is some code to simulate the process:
library(tidyverse)
library(tidybayes)
library(cmdstanr)
library(posterior)
n = 100
is_biased = rbinom(n, 1 , 0.77)
b = 0.5 + is_biased*(rbeta(n, 90, 10) - 0.5)
x = rbinom(n, 100, p)
d = tibble(coin = factor(1:n), heads = x)
Question
Is a mixture of binomials the right idea, and if it is, have I approproately implemented it in Stan? Additionally, how might I go about estimating the probability that any one of the particular coins is biased? I remember watching a video that Andrew Gelman made a while ago in which he used mixtures for something related to climate change (well…it was a bet about something related to climate change). In that video, I think the probability of belonging to one mixture was computed using an approach like…
generated quantities{
matrix[n,2] pn;
matrix[n,2] ps;
for (i in 1:n){
pn[i,1] = log_mix(prob_of_bias,
binomial_lpmf(heads[i] | 100, b[i]),
binomial_lpmf(heads[i] | 100, 0.5));
pn[i,2] = log_mix(1 - prob_of_bias,
binomial_lpmf(heads[i] | 100, b[i]),
binomial_lpmf(heads[i] | 100, 0.5));
ps[i,] = pn[i,]/sum(pn[i,]);
}
}
However, this approach doesn’t seem to work (the model can not discriminate between being biased and not being biased).
Happy to ad more details if necessary.