Inability to model this trivial problem is driving me nuts: I have two events A and B that occur independently of each other with a probability pA and pB. What I observe the number of time when an event A, B, both or none occured in the fixed amount of trials. My task is to reconstruct pA and pB with Stan: (Yes, I know this is a trivial, analytically solvable problem)
library(rstan)
chain_count = parallel::detectCores()
options(mc.cores=chain_count)
pA = 0.952
pB = 0.952
xA = sample(c(0,1),N,replace=TRUE,prob=c(1-pA,pA))
xB = sample(c(0,1),N,replace=TRUE,prob=c(1-pB,pB))
x = xA + xB*2
data=list(N=N,
n_nn=sum(x==0),
n_pn=sum(x==1),
n_np=sum(x==2),
n_pp=sum(x==3))
model2 = stan_model("model2.stan")
fit2 = sampling(model2, data=data, chains=chain_count/2, iter=4000)
fit2
I want to reconstruct the p1 and p2 with the following code:
data {
int<lower=0> n_nn;
int<lower=0> n_np;
int<lower=0> n_pn;
int<lower=0> n_pp;
}
parameters {
real<lower=0, upper=1> pA;
real<lower=0, upper=1> pB;
}
model {
target+=bernoulli_lpmf(1| pA)*n_pn;
target+=bernoulli_lpmf(1| pB)*n_np;
target+=bernoulli_lpmf(1| pA * pB)*n_pp;
target+=bernoulli_lpmf(1| (1-pA) + (1-pB) - (1-pA)*(1-pB))*n_nn;
}
The model compiles and runs, but the results make no sense. For instance, for data generated with pA=0.0952
and pB=pA
$n_nn
[1] 821
$n_pn
[1] 88
$n_np
[1] 84
$n_pp
[1] 7
I get 95%CI for pA: 0.41 … 0.99
and for pB: 0.09 … 0.26 !