Hi,

I am trying to model something relatively easy but somehow can’t get it together. I believe my problem is mostly lack of knowledge in bayesian statistics than stan but I’d be really glad to get some help.

Let me first explain what I am trying to do and what kind of data I have before going into the model so maybe it can help to get the context.

I have an already existing article recommender model that recommend one related article for each article you browse. The goal is to asses if the recommender model is doing a good job by looking at the improvement of CTR for the related articles.

The test was run in a way such that we have a control group where we collect all the clicks and impressions for a “related article” on pages that it would **not** have been recommended by the model and the experiment is we collect the clicks and impressions for the related article on only page it was recommended.

To give a more concrete example.

Let’s say I have data as follow:

```
mydf <- data.frame(
assignment = c(rep(1,4),rep(2,4)),
impressions = c(12003,40049,3077,8021,600937,31059,3008,3000),
clicks = c(142,84,31,60,833,246,206,168),
variant = rep(1:4,2)
)
```

So it looks like that

assignment | impressions | clicks | variant |
---|---|---|---|

1 | 12003 | 142 | 1 |

1 | 40049 | 84 | 2 |

1 | 3077 | 31 | 3 |

1 | 8021 | 60 | 4 |

2 | 600937 | 833 | 1 |

2 | 31059 | 246 | 2 |

2 | 3008 | 206 | 3 |

2 | 3000 | 168 | 4 |

Assignment is control(2) and experiment(1).

Variant is the related article id.

For example on the first line it says that when we did put the related article 1 on pages that were recommended by the model we had a total amount of impressions of 12003 and a total amount of clicks of 142. On the other hand when you put the related article 1 on pages that were not recommended by the model you had a total amount of impressions of 600937 and 833 clicks.

I would like to know if assignment 1 has a higher CTR than assignment 2.

My assumption is that some “related article” have higher overall CTR because of (higher quality content) and therefore I believe I should have a prior on each variant that is common to the control group and experiment group.

Seems like a fairly easy test. However, I can’t figure out how to model it such that you have some partial pooling between the variant. It seems to me that because of the way the data is put together I have very few point with low uncertainty which make the model very sensible to the prior.

I tried different approach that I’d like to show here.

- Following DBDA and adding another level from this article

The code look like that:

```
mymodel_hie_shrink <-"
data {
int<lower=0> A; //number of assignment
int<lower=0> V; //number of variant
int<lower=0> N; //number of observations
int<lower=0,upper=A> OV[N]; //observed variant
int<lower=0,upper=V> OA[N]; //observed assignment
int<lower=0> I[N]; //number of impressions
int<lower=0> C[N] ; //number of clicks
}
parameters {
vector<lower=0.001, upper=0.03>[V] phi_var;
vector<lower=1>[V] kappa_var;
vector<lower=0.001, upper=0.03>[A] phi_ass;
vector<lower=1>[A] kappa_ass;
vector<lower=0, upper=1>[N] theta; // chance of success
}
transformed parameters {
vector[V] alpha_var;
vector[V] beta_var;
vector[A] alpha_ass;
vector[A] beta_ass;
alpha_ass = phi_ass .* kappa_ass;
beta_ass = (1 - phi_ass) .* kappa_ass;
alpha_var = phi_var .* kappa_var;
beta_var = (1 - phi_var) .* kappa_var;
}
model {
kappa_ass ~ gamma(0.01,0.05); // hyperprior
kappa_var ~ gamma(0.01,0.05); // hyperprior
for (i in 1:N){
target += beta_lpdf(phi_ad[OV[i]] | alpha_ass[OA[i]],beta_ass[OA[i]]);
target += beta_lpdf(theta[i] | alpha_var[OV[i]],beta_var[OV[i]]);
target += binomial_lpmf( C[i] | I[i],theta[i]) ;
}
}
generated quantities {
real diff;
diff = phi_ass[1] - phi_ass[2];
}
"
```

I tried different prior and it seems to be very sensible to the prior, in addition with theses prior the result does not really make sense. Please note that I also tryied to change the hyperprior to use what Bob’s case study used (Gelman’s prior).

- Using a built in approach

the code look like that

```
data {
int<lower=0> A; //number of assigmnent
int<lower=0> V; //number of variant
int<lower=0> N; //number of observations
int<lower=0,upper=V> OV[N]; //observed variant
int<lower=0,upper=A> OA[N]; //observed assignment
int<lower=0> I[N]; //number of impressions
int<lower=0> C[N] ; //number of clicks
}
parameters {
vector<lower=0, upper=1>[V] phi; // population chance of success
vector<lower=1>[V] kappa; // population concentration
vector<lower=0, upper=1>[N] theta; // chance of success
vector<lower=-.03,upper=0.03>[V] delta; // improvement of experiment vs control
real<lower=-0.3,upper=0.3> mu; //prior on mean diff
real<lower=0,upper=0.05> sigma; //prior on sd diff
}
transformed parameters {
vector[N] alpha;
vector[N] beta;
vector[N] theta_prime;
for (i in 1:N){
int idx = OV[i];
alpha[i] = phi[idx] * kappa[idx];
beta[i] = (1 - phi[idx]) * kappa[idx];
theta_prime = theta;
if( vidx == 1) {
theta_prime = theta_prime + delta[idx];
}
}
}
model {
kappa ~ pareto(1, 1.5); // hyperprior
mu ~ normal(0,0.01);
sigma ~ exponential(200);
delta ~ normal(mu,sigma);
theta ~ beta(alpha,beta);
C ~ binomial(I,theta_prime);
}
generated quantities {
real diff;
diff = mu;
}
```

Again this approach had a very hard time (if you change the data set a bit it will not converge) and is very sensible to the prior.

Notes:

As a comparison I am using a simple model that just does partial pooling on the assignment:

```
data {
int<lower=0> A; //number of assigmnent
int<lower=0> V; //number of variant
int<lower=0> N; //number of observations
int<lower=0,upper=V> OV[N]; //observed variant
int<lower=0,upper=A> OA[N]; //observed assignment
int<lower=0> I[N]; //number of impressions
int<lower=0> C[N] ; //number of clicks
}
parameters {
vector<lower=0, upper=1>[A] theta; // chance of success
}
model {
theta ~ beta(1,1);
C ~ binomial(I,theta[OA]);
}
generated quantities {
real diff;
diff = theta[1] - theta[2];
}
```

And my result are completely different.

is there anything I am missing? What kind of model could I use with pretty vague priors?