Modelling a union of two Bernoulli events

adamryczkowski · April 7, 2023, 7:08am

Inability to model this trivial problem is driving me nuts: I have two events A and B that occur independently of each other with a probability pA and pB. What I observe the number of time when an event A, B, both or none occured in the fixed amount of trials. My task is to reconstruct pA and pB with Stan: (Yes, I know this is a trivial, analytically solvable problem)

library(rstan)
chain_count = parallel::detectCores()
options(mc.cores=chain_count)
pA = 0.952
pB = 0.952

xA = sample(c(0,1),N,replace=TRUE,prob=c(1-pA,pA))
xB = sample(c(0,1),N,replace=TRUE,prob=c(1-pB,pB))
x = xA + xB*2
data=list(N=N, 
         n_nn=sum(x==0),
         n_pn=sum(x==1),
         n_np=sum(x==2),
         n_pp=sum(x==3))

model2 = stan_model("model2.stan")
fit2 = sampling(model2, data=data, chains=chain_count/2, iter=4000)
fit2

I want to reconstruct the p1 and p2 with the following code:

data {
  int<lower=0> n_nn;
  int<lower=0> n_np;
  int<lower=0> n_pn;
  int<lower=0> n_pp;
}

parameters {
  real<lower=0, upper=1> pA;
  real<lower=0, upper=1> pB;
}

model {
  target+=bernoulli_lpmf(1| pA)*n_pn;
  target+=bernoulli_lpmf(1| pB)*n_np;
  target+=bernoulli_lpmf(1| pA * pB)*n_pp;
  target+=bernoulli_lpmf(1| (1-pA) + (1-pB) - (1-pA)*(1-pB))*n_nn;
}

The model compiles and runs, but the results make no sense. For instance, for data generated with pA=0.0952 and pB=pA

$n_nn
[1] 821

$n_pn
[1] 88

$n_np
[1] 84

$n_pp
[1] 7

I get 95%CI for pA: 0.41 … 0.99
and for pB: 0.09 … 0.26 !

adamryczkowski · April 7, 2023, 7:17am

Of course I’ve got wrong results, because it is a binomial distribution!

Here’s the correct model:

data {
  int<lower=0> n_nn;
  int<lower=0> n_np;
  int<lower=0> n_pn;
  int<lower=0> n_pp;
}

transformed data {
  int N = n_nn+n_np+n_pn+n_pp;
}

parameters {
  real<lower=0, upper=1> pA;
  real<lower=0, upper=1> pB;
}

model {
  target+=binomial_lpmf(n_pn| N, pA);
  target+=binomial_lpmf(n_np| N, pB);
  target+=binomial_lpmf(n_pp| N, pA * pB);
  target+=binomial_lpmf(n_nn| N, (1-pA) + (1-pB) - (1-pA)*(1-pB));
}

adamryczkowski · April 7, 2023, 7:20am

The “model” part now can be re-written in the more elegant form:

model {
  n_pn~binomial(N, pA);
  n_np~binomial(N, pB);
  n_pp~binomial(N, pA * pB);
  n_nn~binomial(N, (1-pA) + (1-pB) - (1-pA)*(1-pB));
}

DouglasBoubert · April 7, 2023, 8:07am

Since the events A and B are independent, you can calculate n_A = n_pn + n_pp and n_B = n_np + n_pp and use:

model {
  n_A~binomial(N, pA);
  n_B~binomial(N, pB);
}

adamryczkowski · April 7, 2023, 8:56am

This is also wrong, but in the more subtle way. I am counting the same data twice. The correct answer is this:

data {
  int<lower=0> n_nn;
  int<lower=0> n_pn;
  int<lower=0> n_pp;
  int<lower=0> n_np;
}

transformed data {
  int nd[4];
  nd[1] = n_nn;
  nd[2] = n_pn;
  nd[3] = n_np;
  nd[4] = n_pp;
}

parameters {
  real<lower=0, upper=1> pA;
  real<lower=0, upper=1> pB;
}

transformed parameters {
  simplex[4] s;
  s[1] = (1-pA) * (1-pB);
  s[2] = pA * (1-pB);
  s[3] = pB * (1-pA);
  s[4] = pA * pB;
}

model {
  nd ~ multinomial(s);
}

@DouglasBoubert your answer is 100% correct, simple and beautiful. For my further work I need a form where I can put further constraints, since the model I work on has a third event, “T”, which manifests itself only with counts on n_pp. This is where this seems more useful.

adamryczkowski · April 8, 2023, 10:08am

The original model was not that bad, after all! Here’s the working version of the same approach I did for the original model:

data {
  int<lower=0> n_nn;
  int<lower=0> n_pn;
  int<lower=0> n_pp;
  int<lower=0> n_np;
}

parameters {
  real<lower=0, upper=1> pA;
  real<lower=0, upper=1> pB;
  real<lower=0, upper=1> pT;
}

transformed parameters {
  simplex[4] s;
  s[1] = (1-pA) * (1-pB) * (1-pT);
  s[2] = pA * (1-pB)*(1-pT);
  s[3] = pB * (1-pA)*(1-pT);
  s[4] = pA * pB + pT - pA*pB*pT;
}

model {
  target+=bernoulli_lpmf(1|pA * (1-pB))*n_pn;
  target+=bernoulli_lpmf(1|pB * (1-pA))*n_np;
  target+=bernoulli_lpmf(1|pA * pB)*n_pp;
  target+=bernoulli_lpmf(1|(1-pA) * (1-pB))*n_nn;
}

I simply calculated the probabilities wrong!

So what we have now are three different ways of coding the same simple model. That would make a good entry tutorial for probabilistic programming!

Topic		Replies	Views
Modelling a three binary events: A, B and T. A and B are observed directly, whereas T is only observed as if A+B occured Modeling	5	465	April 9, 2023
Bernoulli process with observational error Modeling rstan , techniques	4	485	December 8, 2023
Modelling the indicator that a sum of Bernoulli random variables is positive Modeling	2	387	January 3, 2021
RStan parallelising using reduce_sum() Modeling	1	447	July 30, 2021
Help with Bernoulli sampling in model block Modeling rstan	4	155	July 5, 2024

Modelling a union of two Bernoulli events

Related topics