# Extend hierarchical reinforcement learning model for multiple conditions

Dear Stan community!

I’m fairly new to Baysian modelling in general and Stan language in particular. So it might be that I’m missing some very obvious issues. However, I’ve been trying quite a lot but got stuck at one point and looking for any kind of hints or ideas!
I ran a within-subject repeated measures experiment with 3 different drug conditions (Placebo+2different neuromodulators) on a rather simple bandit 2 arm reward learning paradigm.
I have a hierarchical reward learning model that is based on similar models from hBayesDM package.
I can run the model on each of the drug conditions separately using using the Single condition model below. This works perfectly well, I get reasonable parameter estimates for the model parameters and also a good parameter recovery for input data.
However, I’d like to set up a model that includes all 3 conditions but I cannot get my head around how to do this. I created a Multiple conditions model (see below) but that seems to be far from what I need. I get divergent transitions for all iterations, chains do not mix at all. I suspect this to be related to the way I set up the parameterization for the multiple conditions. Basically my approach is to have vectors of parameters and then do the very same that I do in the basic model, too. I could imagine that it makes more sense to introduce another level of hierarchy in some other way but I don’t really know how to implement that.
An comments, ideas, or suggestions are highly appreciated!
Thanks!
Best, Simon

Single condition model

``````data {
int<lower=1> N;                       // number of subjects
int<lower=1> T;                       // max number of trials
int<lower=1, upper=T> Tsubj[N];       // number of trials per subject
int<lower=-1, upper=2> choice[N, T];  // choice in given trial
real outcome[N, T];                   // outcome in given trial
real R_scale;                         // scale parameter of cauchy distribution of R parameter
real P_scale;                         // scale parameter of cauchy distribution of P parameter

}
transformed data {
vector[2] initV;  // initial values for EV
initV = rep_vector(0.0, 2);
}
parameters {
// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters
vector[5] mu_pr;
vector<lower=0>[5] sigma;

// Subject-level raw parameters (for Matt trick)
vector[N] Arew_pr;    // reward learning rate
vector[N] Apun_pr;    // punishment learning rate
vector[N] tau_pr;     // inverse temperature
vector[N] R_pr;       // reward sensitivity
vector[N] P_pr;       // punishment sensitivity

}
transformed parameters {
// subject-level parameters
vector<lower=0, upper=1>[N] Arew;
vector<lower=0, upper=1>[N] Apun;
vector<lower=0, upper=5>[N] tau;
vector[N] R; // you may not need the bounding above. It could be causing divergences
vector[N] P;

Arew   = Phi_approx(mu_pr[1] + sigma[1] * Arew_pr);
Apun   = Phi_approx(mu_pr[2] + sigma[2] * Apun_pr);
tau    = Phi_approx(mu_pr[3] + sigma[3] * tau_pr) * 5;
R      = mu_pr[4] + sigma[4] * R_pr; // the multiplier is only necessary when using the Phi_approx function
P      = mu_pr[5] + sigma[5] * P_pr; // which tansforms the parameter to be in 0-1
}
model {
// Hyperparameters
mu_pr  ~ normal(0, 1);
sigma[1:3]  ~ normal(0, 0.2);
// sigma[4:5]  ~ cauchy(0, 1);
sigma[4]  ~ cauchy(0, R_scale);
sigma[5]  ~ cauchy(0, P_scale);

// individual parameters
Arew_pr  ~ normal(0, 1.0);
Apun_pr  ~ normal(0, 1.0);
tau_pr   ~ normal(0, 1.0);
R_pr     ~ normal(0, 1.0);
P_pr     ~ normal(0, 1.0);

// subject loop and trial loop
for (i in 1:N) {
vector[2] ev;    // expected value
real PE;         // prediction error
real alpha;      // effective learning rate (Arew or Apun, respectively)
real sensitivity;// effective sensitivity (R or P, respectively)
ev = initV;

for (t in 1:(Tsubj[i])) {
// compute action probabilities
choice[i, t] ~ categorical_logit(tau[i] * ev);

// prediction error
sensitivity = (outcome[i, t] == 1 ? R[i] :  P[i]);
PE = (outcome[i, t] * sensitivity) - ev[choice[i, t]];

// value updating (learning)
alpha = (PE >= 0) ? Arew[i] : Apun[i];
ev[choice[i, t]] += alpha * PE;
}
}
}
generated quantities {
// For group level parameters
real<lower=0, upper=1> mu_Arew;
real<lower=0, upper=1> mu_Apun;
real<lower=0, upper=5> mu_tau; // I made the upper bound here 5, given it was 1 before
real mu_R;
real mu_P;

// For log likelihood calculation
real log_lik[N];

// For posterior predictive check
real y_pred[N, T];
real PE_pred[N, T];
real ev_pred[N, T, 2];

// Set all posterior predictions to 0 (avoids NULL values)
for (i in 1:N) {
for (t in 1:T) {
y_pred[i, t] = -1;
PE_pred[i, t] = -1;
ev_pred[i, t, 1] = -1;
ev_pred[i, t, 2] = -1;
}
}

mu_Arew   = Phi_approx(mu_pr[1]);
mu_Apun   = Phi_approx(mu_pr[2]);
mu_tau    = Phi_approx(mu_pr[3]) * 5;
mu_R      = mu_pr[4];
mu_P      = mu_pr[5];

{ // local section, this saves time and space
for (i in 1:N) {
vector[2] ev; // expected value
real PE;      // prediction error
int co;
real alpha;
real sensitivity;

// Initialize values
ev = initV;
ev_pred[i, 1, 1] = 0;
ev_pred[i, 1, 2] = 0;
log_lik[i] = 0;

for (t in 1:(Tsubj[i])) {
// get choice of current trial (either 1 or 2)
co = choice[i, t];

// compute log likelihood of current trial
log_lik[i] += categorical_logit_lpmf(choice[i, t] | tau[i] * ev);

// generate posterior prediction for current trial
y_pred[i, t] = categorical_rng(softmax(tau[i] * ev));

// prediction error
sensitivity = (outcome[i, t] == 1 ?  R[i] : P[i]);
PE = (outcome[i, t] * sensitivity) - ev[choice[i, t]];
PE_pred[i, t] = PE;

// value updating (learning)
alpha = (PE >= 0) ? Arew[i] : Apun[i];
ev[co] += alpha * PE;
if (t > 1) {
ev_pred[i, t] = ev_pred[i, t-1];
}
ev_pred[i, t, co] += alpha * PE;
}
}
}
}
``````

Multiple conditions model

``````data {
int<lower=1> N;                                   // number of subjects
int<lower=1> Max_tr;                              // max number of trials
int<lower=1> N_cond;                              // number of conditions
int<lower=0, upper=Max_tr> Tsubj[N, N_cond];      // subjects x n_trial
int<lower=-1, upper=2> choice[N, N_cond, Max_tr]; // subjects x conditions x choices
real outcome[N, N_cond, Max_tr];                  // subjects x conditions x outcomes

real R_scale;                                     // scale parameter of cauchy distribution used to scale sigma (range) of R parameter
real P_scale;                                     // scale parameter of cauchy distribution used to scale sigma (range) of P parameter

}
transformed data {
vector[2] initV;  // initial values for EV
initV = rep_vector(0.0, 2);
}
parameters {

// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters

// Hyperparameter means
real<lower=0, upper=1> mu_Arew[N_cond];
real<lower=0, upper=1> mu_Apun[N_cond];
real mu_R[N_cond];
real mu_P[N_cond];
real<lower=0, upper=5> mu_tau[N_cond];

// Hyperparameter sigmas
real sigma_Arew[N_cond];
real sigma_Apun[N_cond];
real sigma_R[N_cond];
real sigma_P[N_cond];
real sigma_tau[N_cond];

// Subject-level raw parameters (for Matt trick)
vector [N] Arew_pr[N_cond];    // reward learning rate
vector [N] Apun_pr[N_cond];    // punishment learning rate
vector [N] tau_pr[N_cond];     // inverse temperature
vector [N] R_pr[N_cond];       // reward sensitivity
vector [N] P_pr[N_cond];       // punishment sensitivity

}
transformed parameters {
// subject-level parameters
vector<lower=0, upper=1>[N] Arew[N_cond];
vector<lower=0, upper=1>[N] Apun[N_cond];
vector<lower=0, upper=5>[N] tau[N_cond];
vector[N] R[N_cond]; // you may not need the bounding above. It could be causing divergences
vector[N] P[N_cond];

for (j in 1:N_cond) {
Arew[j] = mu_Arew[j]  + sigma_Arew[j]  * Arew_pr[j];
Apun[j] = mu_Apun[j]  + sigma_Apun[j]  * Apun_pr[j];
R[j]    = mu_R[j]     + sigma_R[j]     * R_pr[j];
P[j]    = mu_P[j]     + sigma_P[j]     * P_pr[j];
tau[j]  = mu_tau[j]   + sigma_tau[j]   * tau_pr[j];
}
}
model {

for (j in 1:N_cond) {
// Hyperparameters means
mu_Arew[j] ~ normal(0, 1);
mu_Apun[j] ~ normal(0, 1);
mu_R[j]    ~ normal(0, 1);
mu_P[j]    ~ normal(0, 1);
mu_tau[j]  ~ normal(0, 1);

// Hyperparameters sigmas
sigma_Arew[j] ~ normal(0, 0.2);
sigma_Apun[j] ~ normal(0, 0.2);
sigma_R[j]    ~ cauchy(0, R_scale);
sigma_P[j]    ~ cauchy(0, P_scale);
sigma_tau[j]  ~ normal(0, 0.2);

}

for (j in 1:N_cond) {
// individual parameters
Arew_pr[j]  ~ normal(0, 1.0);
Apun_pr[j]  ~ normal(0, 1.0);
tau_pr[j]   ~ normal(0, 1.0);
R_pr[j]     ~ normal(0, 1.0);
P_pr[j]     ~ normal(0, 1.0);
}

// subject loop and trial loop
for (i in 1:N) {
int n_trials;

// condition loop
for (j in 1:N_cond) {
vector[2] ev;    // expected value
real PE;         // prediction error
real alpha;      // effective learning rate (Arew or Apun, respectively)
real sensitivity;// effective sensitivity (R or P, respectively)

ev = initV;

n_trials = Tsubj[i, j];
if (n_trials <= 0) continue;

// trial loop
for (t in 1:n_trials) {
// compute action probabilities
choice[i, j, t] ~ categorical_logit(tau[j, i] * ev);

// prediction error
sensitivity = (outcome[i, j, t] == 1 ? R[j, i] :  P[j, i]);
PE = (outcome[i, j, t] * sensitivity) - ev[choice[i, j, t]];

// value updating (learning)
alpha = (PE >= 0) ? Arew[j, i] : Apun[j, i];
ev[choice[i, j, t]] += alpha * PE;
}

}
}
}
generated quantities {
// For group level parameters
//real<lower=0, upper=1> mu_Arew;
//real<lower=0, upper=1> mu_Apun;
//real<lower=0, upper=5> mu_tau; // I made the upper bound here 5, given it was 1 before
//real mu_R;
//real mu_P;

// For log likelihood calculation
real log_lik[N, N_cond];

// For posterior predictive check
real y_pred[N, N_cond, Max_tr];
real PE_pred[N, N_cond, Max_tr];
real ev_pred[N, N_cond, Max_tr, 2];

// Set all posterior predictions to 0 (avoids NULL values)
for (i in 1:N) {
for (j in 1:N_cond){
log_lik[i, j] = -1;
for (t in 1:Max_tr) {
y_pred[i, j, t] = -1;
PE_pred[i, j, t] = -1;
ev_pred[i, j, t, 1] = -1;
ev_pred[i, j, t, 2] = -1;
}
}
}

{ // local section, this saves time and space

// subject loop
for (i in 1:N) {

// condition loop
for (j in 1:N_cond) {
vector[2] ev; // expected value
real PE;      // prediction error
int co;
real alpha;
real sensitivity;
int n_trials;

n_trials = Tsubj[i, j];
if (n_trials <= 0) continue;

// Initialize values
ev = initV;
ev_pred[i, j, 1, 1] = 0;
ev_pred[i, j, 1, 2] = 0;
log_lik[i, j] = 0;

// trial loop
for (t in 1:n_trials) {

// get choice of current trial (either 1 or 2)
co = choice[i, j, t];

// compute log likelihood of current trial
log_lik[i, j] += categorical_logit_lpmf(choice[i, j, t] | tau[j, i] * ev);

// generate posterior prediction for current trial
y_pred[i, j, t] = categorical_rng(softmax(tau[j, i] * ev));

// prediction error
sensitivity = (outcome[i, j, t] == 1 ?  R[j, i] : P[j, i]);
PE = (outcome[i, j, t] * sensitivity) - ev[choice[i, j, t]];
PE_pred[i, j, t] = PE;

// value updating (learning)
alpha = (PE >= 0) ? Arew[j, i] : Apun[j, i];
ev[co] += alpha * PE;
if (t > 1) {
ev_pred[i, j, t] = ev_pred[i, j, t-1];
}
ev_pred[i, j, t, co] += alpha * PE;
}

}
}
}
}

``````

Ok, I found another post in this forum by @carinaufer that is concerned with a similar approach.
@Vanessa_Brown provided some sample code that I think does model a repeated measures design of a reinforcement learning model. Although I don’t completely understand this code I get the idea that I need to model covariance of subject level effects between repeated sessions. This makes perfectly sense to me when thinking of classical statistics where the same is done in e.g. mixed models to account repeated measurements within subject.
Unfortunately, I’m still not quite sure how to apply this to my model formulation.

how familiar are you with mixed models (outside of RL modeling)? essentially, you need to set up the same kind of statistical model, except now all of your effects are effects on a parameter instead of directly on your observed variable (plus the actual RL model part, of course). So, if you model placebo as your reference group, then you would need effects for each of your drug groups, plus corresponding variances, etc. For me, it helps to sketch out the statistical model (or even to set it up via brms, which will create a stan model you can use as a skeleton), and then to add in the RL model part.

Dear @Vanessa_Brown,

thanks for the quick reply! I think that in general I understand how parameters are estimated in mixed models. I’d need to think a bit about how to set up variances with respect to the control condition.
May I ask, in the code that you posted, does it also model a reference group? I’m also not sure what data is passed in `subj_vars` and `visit_vars` in the input data…
Thanks for the hint with the `brms` package! I will definetely try to set up a logisitc regression model for my data there (I’ve set up such a model using standard mixed models (`lme4`) before) and see how that is translated to stan code!

modeling the reference group - it should, depending on how you input the visit_vars variables (using the naming from my example code; subj_vars are subject-level variables that wouldn’t change over sessions, so something like age or baseline symptom severity). So if you have one within-subject manipulation with placebo vs. treatment, that would be coded in visit_vars with 0 for the placebo as reference group and 1 for the treatment (or however you’d want to set up your contrasts), with a matrix of those values for subjects by visits ([nS,nV] in my code). Then, the *_m parameter would be the intercept (parameter value for placebo), *_visit_grp_m would be the effect of treatment (change in parameter value with treatment), and the second *_subj_s term as the standard deviation of the treatment effect over participants (the first *_subj_s term is the overall SD, e.g. for intercept). Let me know if this makes sense - as someone used to frequentist mixed models, specifying the hierarchical structure with variances + uncertainty over parameters in a Bayesian sense can get quite confusing!

Ah, ok, so using the `visit_vars` I basically specify the contrasts, right?
So with 2 different treatment conditions I could specify `kV=2` with `visit_vars[, 1, ] = 0` for the control condition, `visit_vars[, 2, 1] = 1` and `visit_vars[, 2, 2] = 0` for the first treatment and likewise `visit_vars[, 3, 1] = 0` and `visit_vars[, 3, 2] = 1` for the second treatment condition to get comparisons of each drug condition to the control condition. Do I understand this correctly?
Your’re absolutely right that I’m having hard times to set up such a model in the Bayesian way! All the more I appreciate your support!
I will look deeper into this over the weekend! I might get back to you though, if that’s ok, since I can already foresee to have further questions!

yup, you got it! and yeah, feel free to ask further.

Hi!

It took me while to really get through the code but I think I got an idea of how that works.
(For reference, this paper helped me to understand how the covariance matrix is set up using the cholesky factor matrix)
I adapted the code or my model (see code below) skipping the subject level variables for now (note that I changed `*visit*` for `*cond*` for “condition”).

However, running this model I get warnings that I’ll have to get solved:

``````# 1: There were 9 divergent transitions after warmup. See
# http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them.
# 2: Examine the pairs() plot to diagnose sampling problems
#
# 3: The largest R-hat is NA, indicating chains have not mixed.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#r-hat
# 4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#bulk-ess
# 5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#tail-ess
``````

I ran the model with 4000 iterations (warmup: 1000). I could try to increase the number of iterations as suggested by the warning message but I’ll need to transfer this to our Linux server as running on my own computer took +13 hours…
Given the large number of parameters in the model I am quite confused as to what I could look at in the `pairs()` plot.

I also wonder if I still have any obvious mistakes in the model specification and/or if there are other things I could try to optimize.
I have set no initial values for the parameters. Could it help to use mean values from a `vb` model as initial values?
I’m also not sure about how to set the priors. In my simple (one session) model I set priors on the SD of the parameters to somewhat limit their range (e.g. `simga[4] ~ cauchy(0, 1)` for the `R` parameter to get values approximately in the range of `[-30, 30]`. Where would I apply such priors in this model? Would that be the `*_cond_s` parameters, i.e. the SD of the condition effect?
Did I understand correctly that the `*_subj_s ~ student_t(2,0,2)` refers to a t distribution with `nC-1` degrees of freedom (i.e. (number of conditions - 1) df) for subject specific SD?
I also have troubles interpreting the scale of the parameters. My first thought was that the `*_m` parameters would correspond to the `mu_*` parameters of my simple model in the placebo condition. However, comparing the summary statistics of these parameters show completely different values:
Mixed model:

``````> as.data.frame(summary(fit2, par = paste(params, '_m'), probs = c(0.025, 0.975))\$summary)
mean   se_mean        sd        2.5%    97.5%     n_eff     Rhat
Arew_m -1.2689189 0.5469930 1.1553174 -3.13506823 1.170741  4.461070 1.336139
Apun_m -1.3192008 0.4809529 1.1011440 -3.06025171 1.120082  5.241831 1.261846
R_m     0.1496364 0.4710745 1.0429759 -1.77837267 2.170962  4.901956 1.290238
P_m     0.2059749 0.5774399 1.1430911 -1.92587057 2.323436  3.918755 1.421119
tau_m   0.5416581 0.1075133 0.4205064  0.02273234 1.578753 15.297513 1.080292
``````

Simple model:

``````> as.data.frame(summary(rw5pI.plac\$fit, par = paste('mu_', params), probs = c(0.025, 0.975))\$summary)
mean     se_mean          sd          2.5%      97.5%     n_eff     Rhat
mu_Arew  0.29181257 0.008509647  0.23279699  3.099822e-02  0.9205625  748.3963 1.004831
mu_Apun  0.03047293 0.003672411  0.08013723  2.756524e-04  0.1235017  476.1744 1.003805
mu_tau   1.32378567 0.021963432  1.05626779  2.348340e-01  4.3081507 2312.8511 1.000700
mu_P    14.98569057 0.504753426 24.23932814 -3.370878e+01 64.3443769 2306.1237 1.000911
mu_R     3.34486372 0.591496086 27.32114081 -5.104459e+01 56.3401627 2133.5062 1.002821
``````

I have not yet tried to run parameter recovery or other posterior predicitve checks. Maybe I’m still on a completely wrong path

Mixed model code

``````data {
int<lower=1> nS;                               // number of subjects
int<lower=1> nT_max;                           // max number of trials
int<lower=1> nC;                               // number of conditions
int<lower=0, upper=nT_max> nT_subj[nS, nC];      // subjects x n_trial

int<lower=-1, upper=2> choice[nS, nT_max, nC]; // subjects x choices x conditions
real outcome[nS, nT_max, nC];                  // subjects x outcomes x conditions
real missing_cond[nS, nC];                     // subjects x conditions (1=missing, 0=data available)

int kC;                                        // number of contrasts on conditions
real cond_vars[nS, nC, kC];                    // condition level matrix (contrasts on conditions)

real R_scale;                                  // scale parameter of cauchy distribution used to scale sigma (range) of R parameter
real P_scale;                                  // scale parameter of cauchy distribution used to scale sigma (range) of P parameter

}
transformed data {
vector[2] initV;  // initial values for EV
initV = rep_vector(0.0, 2);
}
parameters {

// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters

// group level means (intercept for Placebo/control condition)
real Arew_m;             // reward learning rate
real Apun_m;             // punishment learning rate
real R_m;                // reward sensitivity
real P_m;                // reward sensitivity
real<lower=0> tau_m;     // inverse temperature

// condition level effects (change of parameter estimate with treatment/condition)
vector[kC] Arew_cond_grp_m;
vector[kC] Apun_cond_grp_m;
vector[kC] R_cond_grp_m;
vector[kC] P_cond_grp_m;
vector[kC] tau_cond_grp_m;

// condition-level (within subject) SDs (sigma2_y) (overall SD for intercept?)
real<lower=0> Arew_cond_s;
real<lower=0> Apun_cond_s;
real<lower=0> R_cond_s;
real<lower=0> P_cond_s;
real<lower=0> tau_cond_s;

// SDs of condition-level effects (SD of treatment effect over participants)
// will be transformed with subjects' raw parameter to yield subject specific covarying deviation per condition
vector<lower=0>[kC+1] Arew_subj_s;
vector<lower=0>[kC+1] Apun_subj_s;
vector<lower=0>[kC+1] R_subj_s;
vector<lower=0>[kC+1] P_subj_s;
vector<lower=0>[kC+1] tau_subj_s;

// non-centered parameterization (NCP) variance effect per cond & subject
matrix[nS,nC] Arew_cond_raw;
matrix[nS,nC] Apun_cond_raw;
matrix[nS,nC] R_cond_raw;
matrix[nS,nC] P_cond_raw;
matrix[nS,nC] tau_cond_raw;

// NCP variance effect on subj-level effects
matrix[kC+1,nS] Arew_subj_raw;
matrix[kC+1,nS] Apun_subj_raw;
matrix[kC+1,nS] R_subj_raw;
matrix[kC+1,nS] P_subj_raw;
matrix[kC+1,nS] tau_subj_raw;

// Cholesky factors of correlation matrices for subj-level variances
cholesky_factor_corr[kC+1] Arew_subj_L;
cholesky_factor_corr[kC+1] Apun_subj_L;
cholesky_factor_corr[kC+1] R_subj_L;
cholesky_factor_corr[kC+1] P_subj_L;
cholesky_factor_corr[kC+1] tau_subj_L;

}
transformed parameters {

// transformed parameters
matrix<lower=0,upper=1>[nS, nC] Arew;             // reward learning rate
matrix<lower=0,upper=1>[nS, nC] Apun;             // punishment learning rate
matrix[nS, nC] R;                                  // reward sensitivity
matrix[nS, nC] P;                                  // reward sensitivity
matrix[nS, nC] tau;              // inverse temperature

// additional inv_logit transform for learning rate to restrict to [0, 1]
matrix[nS, nC] Arew_normal;             // reward learning rate
matrix[nS, nC] Apun_normal;             // punishment learning rate

// Convert Cholesky factorized correlation matrix into SDs per visit-level effect
// creates matrix of nS rows and kC+1 correlated random variables
// (*_subj_raw defined as random variable by to_vector(*_subj_raw[,s]) ~ normal(0, 1); statements in model block)
matrix[nS,kC+1] Arew_vars = (diag_pre_multiply(Arew_subj_s, Arew_subj_L) * Arew_subj_raw)';
matrix[nS,kC+1] Apun_vars = (diag_pre_multiply(Apun_subj_s, Apun_subj_L) * Apun_subj_raw)';
matrix[nS,kC+1] R_vars    = (diag_pre_multiply(R_subj_s,    R_subj_L)    * R_subj_raw)';
matrix[nS,kC+1] P_vars    = (diag_pre_multiply(P_subj_s,    P_subj_L)    * P_subj_raw)';
matrix[nS,kC+1] tau_vars  = (diag_pre_multiply(tau_subj_s,  tau_subj_L)  * tau_subj_raw)';

//create transformed parameters using non-centered parameterization for all model parameters

// add in subject and visit-level effects as shifts in means
for (s in 1:nS) {
// transformed parameter for each subject equals
// overall mean +           // group level mean across conditions
// SD[cond]*raw[cond] +     // group level SD * condition offset of each subject (variation of subjects around group mean)
// SD[subject]*raw[subject] // single value reflecting subject specific offset across conditions (i.e. for Placebo in case of it being the control condition)
Arew_normal[s,] = Arew_m + Arew_cond_s * Arew_cond_raw[s,] + Arew_vars[s,1];
Apun_normal[s,] = Apun_m + Apun_cond_s * Apun_cond_raw[s,] + Apun_vars[s,1];
R[s,]           = R_m    + R_cond_s    * R_cond_raw[s,]    + R_vars[s,1];
P[s,]           = P_m    + P_cond_s    * P_cond_raw[s,]    + P_vars[s,1];
tau[s,]         = tau_m  + tau_cond_s  * tau_cond_raw[s,]  + tau_vars[s,1];

for (v in 1:nC) {

if (missing_cond[s,v]==0) {   // no effect if condition is missing

// loop over contrasts
for (kk in 1:kC) {

// main effects of visit-level variables
// contrast specific offset +    // effect of treatment (change in parameter value with treatment)
// subject specific offset       // that was defined to be correlated with general subject specific offset (see above)
Arew_normal[s,v] += cond_vars[s,v,kk] * (Arew_cond_grp_m[kk] + Arew_vars[s,kk+1]);
Apun_normal[s,v] += cond_vars[s,v,kk] * (Apun_cond_grp_m[kk] + Apun_vars[s,kk+1]);
R[s,v]           += cond_vars[s,v,kk] * (R_cond_grp_m[kk]    + R_vars[s,kk+1]);
P[s,v]           += cond_vars[s,v,kk] * (P_cond_grp_m[kk]    + P_vars[s,kk+1]);
tau[s,v]         += cond_vars[s,v,kk] * (tau_cond_grp_m[kk]  + tau_vars[s,kk+1]);

}    // end of contrast loop

}      // end no effect if condition is missing

}        // end of condition loop

//transform alpha to [0,1]
Arew[s,] = inv_logit(Arew_normal[s,]);
Apun[s,] = inv_logit(Apun_normal[s,]);

}          // end of subject loop

}

model {

//define priors

// group level mean across conditions
Arew_m ~ normal(0, 1);
Apun_m ~ normal(0, 1);
R_m    ~ normal(0, 1);
P_m    ~ normal(0, 1);
tau_m  ~ normal(0, 1);

// condition specific effect
Arew_cond_grp_m ~ normal(0, 1);
Apun_cond_grp_m ~ normal(0, 1);
R_cond_grp_m    ~ normal(0, 1);
P_cond_grp_m    ~ normal(0, 1);
tau_cond_grp_m  ~ normal(0, 1);

Arew_cond_s ~ cauchy(0, 2);
Apun_cond_s ~ cauchy(0, 2);
R_cond_s    ~ cauchy(0, 2);
P_cond_s    ~ cauchy(0, 2);
tau_cond_s  ~ cauchy(0, 2);

for (s in 1:nS) {
// distribution of subjects around group level mean
Arew_cond_raw[s,] ~ normal(0, 1);
Apun_cond_raw[s,] ~ normal(0, 1);
R_cond_raw[s,]    ~ normal(0, 1);
P_cond_raw[s,]    ~ normal(0, 1);
tau_cond_raw[s,]  ~ normal(0, 1);

// distribution of of subject specific/random terms around subject mean
// (before transformation to correlated subject specific SD of condition effect)
to_vector(Arew_subj_raw[,s]) ~ normal(0, 1);
to_vector(Apun_subj_raw[,s]) ~ normal(0, 1);
to_vector(R_subj_raw[,s])    ~ normal(0, 1);
to_vector(P_subj_raw[,s])    ~ normal(0, 1);
to_vector(tau_subj_raw[,s])  ~ normal(0, 1);
}

Arew_subj_s ~ student_t(2,0,2);        // nu is df, so number of conditions (nC) -1 ??? or rather (kC+1) - 1?
Apun_subj_s ~ student_t(2,0,2);
R_subj_s    ~ student_t(2,0,3);
P_subj_s    ~ student_t(2,0,3);
tau_subj_s  ~ student_t(2,0,2);

Arew_subj_L ~ lkj_corr_cholesky(1);
Apun_subj_L ~ lkj_corr_cholesky(1);
R_subj_L    ~ lkj_corr_cholesky(1);
P_subj_L    ~ lkj_corr_cholesky(1);
tau_subj_L  ~ lkj_corr_cholesky(1);

// subject loop and trial loop
for (i in 1:nS) {
int n_trials;

// condition loop
for (j in 1:nC) {
vector[2] ev;    // expected value
real PE;         // prediction error
real alpha;      // effective learning rate (Arew or Apun, respectively)
real sensitivity;// effective sensitivity (R or P, respectively)

ev = initV;

n_trials = nT_subj[i, j];
if (n_trials <= 0) continue;

// trial loop
for (t in 1:n_trials) {
// compute action probabilities
choice[i, t, j] ~ categorical_logit(tau[i, j] * ev);

// prediction error
sensitivity = (outcome[i, t, j] == 1 ? R[i, j] :  P[i, j]);
PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];

// value updating (learning)
alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
ev[choice[i, t, j]] += alpha * PE;
}

}
}
}
generated quantities {

// For log likelihood calculation
real log_lik[nS, nC];

// correlation matrices for coefficients (transforms cholesky factor matrices back to correlation matrices of random/subject effects)
corr_matrix[kC+1] Arew_cor = multiply_lower_tri_self_transpose(Arew_subj_L);
corr_matrix[kC+1] Apun_cor = multiply_lower_tri_self_transpose(Apun_subj_L);
corr_matrix[kC+1] R_cor    = multiply_lower_tri_self_transpose(R_subj_L);
corr_matrix[kC+1] P_cor    = multiply_lower_tri_self_transpose(P_subj_L);
corr_matrix[kC+1] tau_cor  = multiply_lower_tri_self_transpose(tau_subj_L);

// For posterior predictive check
real y_pred[nS, nC, nT_max];
real PE_pred[nS, nC, nT_max];
real ev_pred[nS, nC, nT_max, 2];

// Set all posterior predictions to 0 (avoids nSULL values)
for (i in 1:nS) {
for (j in 1:nC){
log_lik[i, j] = -1;
for (t in 1:nT_max) {
y_pred[i, j, t] = -1;
PE_pred[i, j, t] = -1;
ev_pred[i, j, t, 1] = -1;
ev_pred[i, j, t, 2] = -1;
}
}
}

{ // local section, this saves time and space

// subject loop
for (i in 1:nS) {

// condition loop
for (j in 1:nC) {
vector[2] ev; // expected value
real PE;      // prediction error
int co;
real alpha;
real sensitivity;
int n_trials;

n_trials = nT_subj[i, j];
if (n_trials <= 0) continue;

// Initialize values
ev = initV;
ev_pred[i, j, 1, 1] = 0;
ev_pred[i, j, 1, 2] = 0;
log_lik[i, j] = 0;

// trial loop
for (t in 1:n_trials) {

// get choice of current trial (either 1 or 2)
co = choice[i, t, j];

// compute log likelihood of current trial
log_lik[i, j] += categorical_logit_lpmf(choice[i, t, j] | tau[i, j] * ev);

// generate posterior prediction for current trial
y_pred[i, j, t] = categorical_rng(softmax(tau[i, j] * ev));

// prediction error
sensitivity = (outcome[i, t, j] == 1 ?  R[i, j] : P[i, j]);
PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];
PE_pred[i, j, t] = PE;

// value updating (learning)
alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
ev[co] += alpha * PE;
if (t > 1) {
ev_pred[i, j, t] = ev_pred[i, j, t-1];
}
ev_pred[i, j, t, co] += alpha * PE;
}

}
}
}
}

``````

Dear @Vanessa_Brown ,

I have changed the model to estimate `bernoulli_logit()` instead of `categorical_logit()` as you suggested to @carinaufer in the post mentioned above. This reduces the number of divergent transitions but does not eliminate them.
I have, however, managed to get a simpler model working that only models 2 parameters (learning rate `A` and inverse temperature `tau`). Thus, I think the model formulation seems to be ok, maybe I just have insufficient data (only 20 trials per condition and subject) to estimate all the parameters in the more complicated model.
I still struggle with some issues (that could hopefully help to solve the remaining issues):

1. As mentioned before I’m not sure where to implement my expectations (i.e. set priors) for range of my parameters? Would that be the `*_cond_s`parameters? And do I correctly assume that the `nu` parameter of the `student_t` distribution refers to number of contrasts?
2. Due ot the `inv_logit` transformation of the `A` parameter I’m having troubles to interpret the estimates I get for the `A*_m` and `A*_cond_m` parameters. Is there a way I can backtransform these values to a `[0, 1]` scale, too? Can that be done in the `gererated quantities` block? Or post-hoc in R?

Any further help or suggestions appreciated!
Thanks a lot!