# Modelling a union of two Bernoulli events

Inability to model this trivial problem is driving me nuts: I have two events A and B that occur independently of each other with a probability pA and pB. What I observe the number of time when an event A, B, both or none occured in the fixed amount of trials. My task is to reconstruct pA and pB with Stan: (Yes, I know this is a trivial, analytically solvable problem)

``````library(rstan)
chain_count = parallel::detectCores()
options(mc.cores=chain_count)
pA = 0.952
pB = 0.952

xA = sample(c(0,1),N,replace=TRUE,prob=c(1-pA,pA))
xB = sample(c(0,1),N,replace=TRUE,prob=c(1-pB,pB))
x = xA + xB*2
data=list(N=N,
n_nn=sum(x==0),
n_pn=sum(x==1),
n_np=sum(x==2),
n_pp=sum(x==3))

model2 = stan_model("model2.stan")
fit2 = sampling(model2, data=data, chains=chain_count/2, iter=4000)
fit2
``````

I want to reconstruct the p1 and p2 with the following code:

``````data {
int<lower=0> n_nn;
int<lower=0> n_np;
int<lower=0> n_pn;
int<lower=0> n_pp;
}

parameters {
real<lower=0, upper=1> pA;
real<lower=0, upper=1> pB;
}

model {
target+=bernoulli_lpmf(1| pA)*n_pn;
target+=bernoulli_lpmf(1| pB)*n_np;
target+=bernoulli_lpmf(1| pA * pB)*n_pp;
target+=bernoulli_lpmf(1| (1-pA) + (1-pB) - (1-pA)*(1-pB))*n_nn;
}
``````

The model compiles and runs, but the results make no sense. For instance, for data generated with `pA=0.0952` and `pB=pA`

``````\$n_nn
 821

\$n_pn
 88

\$n_np
 84

\$n_pp
 7
``````

I get 95%CI for pA: 0.41 … 0.99
and for pB: 0.09 … 0.26 !

Of course I’ve got wrong results, because it is a binomial distribution!

Here’s the correct model:

``````data {
int<lower=0> n_nn;
int<lower=0> n_np;
int<lower=0> n_pn;
int<lower=0> n_pp;
}

transformed data {
int N = n_nn+n_np+n_pn+n_pp;
}

parameters {
real<lower=0, upper=1> pA;
real<lower=0, upper=1> pB;
}

model {
target+=binomial_lpmf(n_pn| N, pA);
target+=binomial_lpmf(n_np| N, pB);
target+=binomial_lpmf(n_pp| N, pA * pB);
target+=binomial_lpmf(n_nn| N, (1-pA) + (1-pB) - (1-pA)*(1-pB));
}
``````

The “model” part now can be re-written in the more elegant form:

``````model {
n_pn~binomial(N, pA);
n_np~binomial(N, pB);
n_pp~binomial(N, pA * pB);
n_nn~binomial(N, (1-pA) + (1-pB) - (1-pA)*(1-pB));
}
``````

Since the events A and B are independent, you can calculate n_A = n_pn + n_pp and n_B = n_np + n_pp and use:

``````model {
n_A~binomial(N, pA);
n_B~binomial(N, pB);
}
``````

This is also wrong, but in the more subtle way. I am counting the same data twice. The correct answer is this:

``````data {
int<lower=0> n_nn;
int<lower=0> n_pn;
int<lower=0> n_pp;
int<lower=0> n_np;
}

transformed data {
int nd;
nd = n_nn;
nd = n_pn;
nd = n_np;
nd = n_pp;
}

parameters {
real<lower=0, upper=1> pA;
real<lower=0, upper=1> pB;
}

transformed parameters {
simplex s;
s = (1-pA) * (1-pB);
s = pA * (1-pB);
s = pB * (1-pA);
s = pA * pB;
}

model {
nd ~ multinomial(s);
}
``````

@DouglasBoubert your answer is 100% correct, simple and beautiful. For my further work I need a form where I can put further constraints, since the model I work on has a third event, “T”, which manifests itself only with counts on n_pp. This is where this seems more useful.

The original model was not that bad, after all! Here’s the working version of the same approach I did for the original model:

``````data {
int<lower=0> n_nn;
int<lower=0> n_pn;
int<lower=0> n_pp;
int<lower=0> n_np;
}

parameters {
real<lower=0, upper=1> pA;
real<lower=0, upper=1> pB;
real<lower=0, upper=1> pT;
}

transformed parameters {
simplex s;
s = (1-pA) * (1-pB) * (1-pT);
s = pA * (1-pB)*(1-pT);
s = pB * (1-pA)*(1-pT);
s = pA * pB + pT - pA*pB*pT;
}

model {
target+=bernoulli_lpmf(1|pA * (1-pB))*n_pn;
target+=bernoulli_lpmf(1|pB * (1-pA))*n_np;
target+=bernoulli_lpmf(1|pA * pB)*n_pp;
target+=bernoulli_lpmf(1|(1-pA) * (1-pB))*n_nn;
}
``````

I simply calculated the probabilities wrong!

So what we have now are three different ways of coding the same simple model. That would make a good entry tutorial for probabilistic programming!