Extend hierarchical reinforcement learning model for multiple conditions

Hi!

It took me while to really get through the code but I think I got an idea of how that works.
(For reference, this paper helped me to understand how the covariance matrix is set up using the cholesky factor matrix)
I adapted the code or my model (see code below) skipping the subject level variables for now (note that I changed *visit* for *cond* for “condition”).

However, running this model I get warnings that I’ll have to get solved:

# 1: There were 9 divergent transitions after warmup. See
# http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them. 
# 2: Examine the pairs() plot to diagnose sampling problems
# 
# 3: The largest R-hat is NA, indicating chains have not mixed.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#r-hat 
# 4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#bulk-ess 
# 5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#tail-ess 

I ran the model with 4000 iterations (warmup: 1000). I could try to increase the number of iterations as suggested by the warning message but I’ll need to transfer this to our Linux server as running on my own computer took +13 hours…
Given the large number of parameters in the model I am quite confused as to what I could look at in the pairs() plot.

I also wonder if I still have any obvious mistakes in the model specification and/or if there are other things I could try to optimize.
I have set no initial values for the parameters. Could it help to use mean values from a vb model as initial values?
I’m also not sure about how to set the priors. In my simple (one session) model I set priors on the SD of the parameters to somewhat limit their range (e.g. simga[4] ~ cauchy(0, 1) for the R parameter to get values approximately in the range of [-30, 30]. Where would I apply such priors in this model? Would that be the *_cond_s parameters, i.e. the SD of the condition effect?
Did I understand correctly that the *_subj_s ~ student_t(2,0,2) refers to a t distribution with nC-1 degrees of freedom (i.e. (number of conditions - 1) df) for subject specific SD?
I also have troubles interpreting the scale of the parameters. My first thought was that the *_m parameters would correspond to the mu_* parameters of my simple model in the placebo condition. However, comparing the summary statistics of these parameters show completely different values:
Mixed model:

> as.data.frame(summary(fit2, par = paste(params, '_m'), probs = c(0.025, 0.975))$summary)
             mean   se_mean        sd        2.5%    97.5%     n_eff     Rhat
Arew_m -1.2689189 0.5469930 1.1553174 -3.13506823 1.170741  4.461070 1.336139
Apun_m -1.3192008 0.4809529 1.1011440 -3.06025171 1.120082  5.241831 1.261846
R_m     0.1496364 0.4710745 1.0429759 -1.77837267 2.170962  4.901956 1.290238
P_m     0.2059749 0.5774399 1.1430911 -1.92587057 2.323436  3.918755 1.421119
tau_m   0.5416581 0.1075133 0.4205064  0.02273234 1.578753 15.297513 1.080292

Simple model:

> as.data.frame(summary(rw5pI.plac$fit, par = paste('mu_', params), probs = c(0.025, 0.975))$summary)
               mean     se_mean          sd          2.5%      97.5%     n_eff     Rhat
mu_Arew  0.29181257 0.008509647  0.23279699  3.099822e-02  0.9205625  748.3963 1.004831
mu_Apun  0.03047293 0.003672411  0.08013723  2.756524e-04  0.1235017  476.1744 1.003805
mu_tau   1.32378567 0.021963432  1.05626779  2.348340e-01  4.3081507 2312.8511 1.000700
mu_P    14.98569057 0.504753426 24.23932814 -3.370878e+01 64.3443769 2306.1237 1.000911
mu_R     3.34486372 0.591496086 27.32114081 -5.104459e+01 56.3401627 2133.5062 1.002821

I have not yet tried to run parameter recovery or other posterior predicitve checks. Maybe I’m still on a completely wrong path

Mixed model code

data {
  int<lower=1> nS;                               // number of subjects
  int<lower=1> nT_max;                           // max number of trials
  int<lower=1> nC;                               // number of conditions
  int<lower=0, upper=nT_max> nT_subj[nS, nC];      // subjects x n_trial
  
  int<lower=-1, upper=2> choice[nS, nT_max, nC]; // subjects x choices x conditions
  real outcome[nS, nT_max, nC];                  // subjects x outcomes x conditions
  real missing_cond[nS, nC];                     // subjects x conditions (1=missing, 0=data available)
  
  int kC;                                        // number of contrasts on conditions
  real cond_vars[nS, nC, kC];                    // condition level matrix (contrasts on conditions)
  
  
  real R_scale;                                  // scale parameter of cauchy distribution used to scale sigma (range) of R parameter
  real P_scale;                                  // scale parameter of cauchy distribution used to scale sigma (range) of P parameter
  
}
transformed data {
  vector[2] initV;  // initial values for EV
  initV = rep_vector(0.0, 2);
}
parameters {
  
  // Declare all parameters as vectors for vectorizing
  // Hyper(group)-parameters

  
  // group level means (intercept for Placebo/control condition)
  real Arew_m;             // reward learning rate
  real Apun_m;             // punishment learning rate
  real R_m;                // reward sensitivity
  real P_m;                // reward sensitivity
  real<lower=0> tau_m;     // inverse temperature
  
  // condition level effects (change of parameter estimate with treatment/condition)
  vector[kC] Arew_cond_grp_m;
  vector[kC] Apun_cond_grp_m;
  vector[kC] R_cond_grp_m;
  vector[kC] P_cond_grp_m;
  vector[kC] tau_cond_grp_m;
  
  
  // condition-level (within subject) SDs (sigma2_y) (overall SD for intercept?)
  real<lower=0> Arew_cond_s;
  real<lower=0> Apun_cond_s;
  real<lower=0> R_cond_s;
  real<lower=0> P_cond_s;
  real<lower=0> tau_cond_s;
  
  // SDs of condition-level effects (SD of treatment effect over participants)
  // will be transformed with subjects' raw parameter to yield subject specific covarying deviation per condition
  vector<lower=0>[kC+1] Arew_subj_s;
  vector<lower=0>[kC+1] Apun_subj_s;
  vector<lower=0>[kC+1] R_subj_s;
  vector<lower=0>[kC+1] P_subj_s;
  vector<lower=0>[kC+1] tau_subj_s;
  
  
  
  // non-centered parameterization (NCP) variance effect per cond & subject
  matrix[nS,nC] Arew_cond_raw;
  matrix[nS,nC] Apun_cond_raw;
  matrix[nS,nC] R_cond_raw;
  matrix[nS,nC] P_cond_raw;
  matrix[nS,nC] tau_cond_raw;
  
  // NCP variance effect on subj-level effects
  matrix[kC+1,nS] Arew_subj_raw;
  matrix[kC+1,nS] Apun_subj_raw;
  matrix[kC+1,nS] R_subj_raw;
  matrix[kC+1,nS] P_subj_raw;
  matrix[kC+1,nS] tau_subj_raw;
  
  
  // Cholesky factors of correlation matrices for subj-level variances
  cholesky_factor_corr[kC+1] Arew_subj_L;
  cholesky_factor_corr[kC+1] Apun_subj_L;
  cholesky_factor_corr[kC+1] R_subj_L;
  cholesky_factor_corr[kC+1] P_subj_L;
  cholesky_factor_corr[kC+1] tau_subj_L;

}
transformed parameters {
  
  // transformed parameters 
  matrix<lower=0,upper=1>[nS, nC] Arew;             // reward learning rate
  matrix<lower=0,upper=1>[nS, nC] Apun;             // punishment learning rate
  matrix[nS, nC] R;                                  // reward sensitivity
  matrix[nS, nC] P;                                  // reward sensitivity
  matrix[nS, nC] tau;              // inverse temperature
  
  // additional inv_logit transform for learning rate to restrict to [0, 1]
  matrix[nS, nC] Arew_normal;             // reward learning rate
  matrix[nS, nC] Apun_normal;             // punishment learning rate
  
  // Convert Cholesky factorized correlation matrix into SDs per visit-level effect
  // creates matrix of nS rows and kC+1 correlated random variables 
  // (*_subj_raw defined as random variable by to_vector(*_subj_raw[,s]) ~ normal(0, 1); statements in model block)
  matrix[nS,kC+1] Arew_vars = (diag_pre_multiply(Arew_subj_s, Arew_subj_L) * Arew_subj_raw)';
  matrix[nS,kC+1] Apun_vars = (diag_pre_multiply(Apun_subj_s, Apun_subj_L) * Apun_subj_raw)';
  matrix[nS,kC+1] R_vars    = (diag_pre_multiply(R_subj_s,    R_subj_L)    * R_subj_raw)';
  matrix[nS,kC+1] P_vars    = (diag_pre_multiply(P_subj_s,    P_subj_L)    * P_subj_raw)';
  matrix[nS,kC+1] tau_vars  = (diag_pre_multiply(tau_subj_s,  tau_subj_L)  * tau_subj_raw)';
  
  //create transformed parameters using non-centered parameterization for all model parameters
  
  // add in subject and visit-level effects as shifts in means
  for (s in 1:nS) {
    // transformed parameter for each subject equals 
    // overall mean +           // group level mean across conditions 
    // SD[cond]*raw[cond] +     // group level SD * condition offset of each subject (variation of subjects around group mean)
    // SD[subject]*raw[subject] // single value reflecting subject specific offset across conditions (i.e. for Placebo in case of it being the control condition)
    Arew_normal[s,] = Arew_m + Arew_cond_s * Arew_cond_raw[s,] + Arew_vars[s,1]; 
    Apun_normal[s,] = Apun_m + Apun_cond_s * Apun_cond_raw[s,] + Apun_vars[s,1];
    R[s,]           = R_m    + R_cond_s    * R_cond_raw[s,]    + R_vars[s,1];
    P[s,]           = P_m    + P_cond_s    * P_cond_raw[s,]    + P_vars[s,1];
    tau[s,]         = tau_m  + tau_cond_s  * tau_cond_raw[s,]  + tau_vars[s,1];
    
    //add subj- and visit-level effects
    for (v in 1:nC) {
      
      if (missing_cond[s,v]==0) {   // no effect if condition is missing
      
        // loop over contrasts
        for (kk in 1:kC) { 
          
          // main effects of visit-level variables
          // for each contrast add 
          // contrast specific offset +    // effect of treatment (change in parameter value with treatment)
          // subject specific offset       // that was defined to be correlated with general subject specific offset (see above)
          Arew_normal[s,v] += cond_vars[s,v,kk] * (Arew_cond_grp_m[kk] + Arew_vars[s,kk+1]);
          Apun_normal[s,v] += cond_vars[s,v,kk] * (Apun_cond_grp_m[kk] + Apun_vars[s,kk+1]);
          R[s,v]           += cond_vars[s,v,kk] * (R_cond_grp_m[kk]    + R_vars[s,kk+1]);
          P[s,v]           += cond_vars[s,v,kk] * (P_cond_grp_m[kk]    + P_vars[s,kk+1]);
          tau[s,v]         += cond_vars[s,v,kk] * (tau_cond_grp_m[kk]  + tau_vars[s,kk+1]);

          
        }    // end of contrast loop
        
      }      // end no effect if condition is missing
    
    }        // end of condition loop
    
    //transform alpha to [0,1]
    Arew[s,] = inv_logit(Arew_normal[s,]);
    Apun[s,] = inv_logit(Apun_normal[s,]);
  
  }          // end of subject loop
  
  
  
}

model {
  
  //define priors
  
  // group level mean across conditions
  Arew_m ~ normal(0, 1);
  Apun_m ~ normal(0, 1);
  R_m    ~ normal(0, 1);
  P_m    ~ normal(0, 1);
  tau_m  ~ normal(0, 1);
  
  // condition specific effect
  Arew_cond_grp_m ~ normal(0, 1);
  Apun_cond_grp_m ~ normal(0, 1);
  R_cond_grp_m    ~ normal(0, 1);
  P_cond_grp_m    ~ normal(0, 1);
  tau_cond_grp_m  ~ normal(0, 1);

  Arew_cond_s ~ cauchy(0, 2);
  Apun_cond_s ~ cauchy(0, 2);
  R_cond_s    ~ cauchy(0, 2);
  P_cond_s    ~ cauchy(0, 2);
  tau_cond_s  ~ cauchy(0, 2);
  
  for (s in 1:nS) {
    // distribution of subjects around group level mean
    Arew_cond_raw[s,] ~ normal(0, 1);
    Apun_cond_raw[s,] ~ normal(0, 1);
    R_cond_raw[s,]    ~ normal(0, 1);
    P_cond_raw[s,]    ~ normal(0, 1);
    tau_cond_raw[s,]  ~ normal(0, 1);
    
    // distribution of of subject specific/random terms around subject mean 
    // (before transformation to correlated subject specific SD of condition effect)
    to_vector(Arew_subj_raw[,s]) ~ normal(0, 1);
    to_vector(Apun_subj_raw[,s]) ~ normal(0, 1);
    to_vector(R_subj_raw[,s])    ~ normal(0, 1);
    to_vector(P_subj_raw[,s])    ~ normal(0, 1);
    to_vector(tau_subj_raw[,s])  ~ normal(0, 1);
  }
  
  Arew_subj_s ~ student_t(2,0,2);        // nu is df, so number of conditions (nC) -1 ??? or rather (kC+1) - 1?
  Apun_subj_s ~ student_t(2,0,2);
  R_subj_s    ~ student_t(2,0,3);
  P_subj_s    ~ student_t(2,0,3);
  tau_subj_s  ~ student_t(2,0,2);
  
  Arew_subj_L ~ lkj_corr_cholesky(1);
  Apun_subj_L ~ lkj_corr_cholesky(1);
  R_subj_L    ~ lkj_corr_cholesky(1);
  P_subj_L    ~ lkj_corr_cholesky(1);
  tau_subj_L  ~ lkj_corr_cholesky(1);

  // subject loop and trial loop
  for (i in 1:nS) {
    int n_trials;
    
    // condition loop
    for (j in 1:nC) {
      vector[2] ev;    // expected value
      real PE;         // prediction error
      real alpha;      // effective learning rate (Arew or Apun, respectively)
      real sensitivity;// effective sensitivity (R or P, respectively)
      
      ev = initV;
      
      n_trials = nT_subj[i, j];
      if (n_trials <= 0) continue;
      
      // trial loop
      for (t in 1:n_trials) {
        // compute action probabilities
        choice[i, t, j] ~ categorical_logit(tau[i, j] * ev);
  
        // prediction error
        sensitivity = (outcome[i, t, j] == 1 ? R[i, j] :  P[i, j]);
        PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];
  
        // value updating (learning)
        alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
        ev[choice[i, t, j]] += alpha * PE;
      }
      
    }
  }
}
generated quantities {


  // For log likelihood calculation
  real log_lik[nS, nC];
  
  // correlation matrices for coefficients (transforms cholesky factor matrices back to correlation matrices of random/subject effects)
  corr_matrix[kC+1] Arew_cor = multiply_lower_tri_self_transpose(Arew_subj_L);
  corr_matrix[kC+1] Apun_cor = multiply_lower_tri_self_transpose(Apun_subj_L);
  corr_matrix[kC+1] R_cor    = multiply_lower_tri_self_transpose(R_subj_L);
  corr_matrix[kC+1] P_cor    = multiply_lower_tri_self_transpose(P_subj_L);
  corr_matrix[kC+1] tau_cor  = multiply_lower_tri_self_transpose(tau_subj_L);

  // For posterior predictive check
  real y_pred[nS, nC, nT_max];
  real PE_pred[nS, nC, nT_max];
  real ev_pred[nS, nC, nT_max, 2];

  // Set all posterior predictions to 0 (avoids nSULL values)
  for (i in 1:nS) {
    for (j in 1:nC){
      log_lik[i, j] = -1;
      for (t in 1:nT_max) {
        y_pred[i, j, t] = -1;
        PE_pred[i, j, t] = -1;
        ev_pred[i, j, t, 1] = -1;
        ev_pred[i, j, t, 2] = -1;
      }
    }
  }


  { // local section, this saves time and space
  
    // subject loop
    for (i in 1:nS) {
      
      // condition loop
      for (j in 1:nC) {
        vector[2] ev; // expected value
        real PE;      // prediction error
        int co;
        real alpha;
        real sensitivity;
        int n_trials;
        
        n_trials = nT_subj[i, j];
        if (n_trials <= 0) continue;

        // Initialize values
        ev = initV;
        ev_pred[i, j, 1, 1] = 0;
        ev_pred[i, j, 1, 2] = 0;
        log_lik[i, j] = 0;
  
        // trial loop
        for (t in 1:n_trials) {
          
          // get choice of current trial (either 1 or 2)
          co = choice[i, t, j];
  
          // compute log likelihood of current trial
          log_lik[i, j] += categorical_logit_lpmf(choice[i, t, j] | tau[i, j] * ev);
  
          // generate posterior prediction for current trial
          y_pred[i, j, t] = categorical_rng(softmax(tau[i, j] * ev));
  
          // prediction error
          sensitivity = (outcome[i, t, j] == 1 ?  R[i, j] : P[i, j]);
          PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];
          PE_pred[i, j, t] = PE;
  
  
          // value updating (learning)
          alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
          ev[co] += alpha * PE;
          if (t > 1) {
             ev_pred[i, j, t] = ev_pred[i, j, t-1];
          }
          ev_pred[i, j, t, co] += alpha * PE;
        }
      
      }
    }
  }
}