Repeated measure hierarchical reinforcement learning with groups (2 x 2 x 2 design)

carinaufer · July 23, 2020, 10:35am

Hello,

I am relatively new to stan. Last semester I followed a course called Bayesian hierarchical modeling where I was introduced to rstan. Now I am working on a empirical project where I try to model a repeated reinforcement learning task (two armed bandit) with a 2x2x2 within-between design (-> 8 different groups).
The research question is that the groups differ in their learning rate and inverse temperature parameter. From special interest is also the interaction effect.

My initial plan was to fit the data with the hbayesdm package, because it promises to be easy to handle even for beginners, however I quickly discovered that my data is not handled as repeated measures, because each participant completed the task 3 times in condition 1 and 3 times in condition 2. Hence it is interpreted as 48 consecutive learning choices not 3 * 16.
Thus, I took the stan code from the hbayesdm package (see below) and fitted it manually to my data (per group).
However the model does not fit very well resulting in Rhat > 3 for some parameters. (I read in the forum that I can ignore Rhat for predictions from the generate quantities part. So is it correct that I can ignore them? Because some show me NAs there?)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=T> Tsubj[N];
  int<lower=-1, upper=2> choice[N, T];
  real outcome[N, T];  // no lower and upper bounds
}
transformed data {
  vector[2] initV;  // initial values for EV
  initV = rep_vector(0.0, 2);
}
parameters {
// Declare all parameters as vectors for vectorizing
  // Hyper(group)-parameters
  vector[2] mu_pr;
  vector<lower=0>[2] sigma;

  // Subject-level raw parameters (for Matt trick)
  vector[N] A_pr;    // learning rate
  vector[N] tau_pr;  // inverse temperature
}
transformed parameters {
  // subject-level parameters
  vector<lower=0, upper=1>[N] A;
  vector<lower=0, upper=5>[N] tau;

  for (i in 1:N) {
    A[i]   = Phi_approx(mu_pr[1]  + sigma[1]  * A_pr[i]);
    tau[i] = Phi_approx(mu_pr[2] + sigma[2] * tau_pr[i]) * 5;
  }
}
model {
  // Hyperparameters
  mu_pr  ~ normal(0, 1);
  sigma ~ normal(0, 0.2);

  // individual parameters
  A_pr   ~ normal(0, 1);
  tau_pr ~ normal(0, 1);

  // subject loop and trial loop
  for (i in 1:N) {
    vector[2] ev; // expected value
    real PE;      // prediction error

    ev = initV;

    for (t in 1:(Tsubj[i])) {
      // compute action probabilities
      choice[i, t] ~ categorical_logit(tau[i] * ev);

      // prediction error
      PE = outcome[i, t] - ev[choice[i, t]];

      // value updating (learning)
      ev[choice[i, t]] += A[i] * PE;
    }
  }
}
generated quantities {
  // For group level parameters
  real<lower=0, upper=1> mu_A;
  real<lower=0, upper=5> mu_tau;

  // For log likelihood calculation
  real log_lik[N];

  // For posterior predictive check
  real y_pred[N, T];

  // Set all posterior predictions to 0 (avoids NULL values)
  for (i in 1:N) {
    for (t in 1:T) {
      y_pred[i, t] = -1;
    }
  }

  mu_A   = Phi_approx(mu_pr[1]);
  mu_tau = Phi_approx(mu_pr[2]) * 5;

  { // local section, this saves time and space
    for (i in 1:N) {
      vector[2] ev; // expected value
      real PE;      // prediction error

      // Initialize values
      ev = initV;

      log_lik[i] = 0;

      for (t in 1:(Tsubj[i])) {
        // compute log likelihood of current trial
        log_lik[i] += categorical_logit_lpmf(choice[i, t] | tau[i] * ev);

        // generate posterior prediction for current trial
        y_pred[i, t] = categorical_rng(softmax(tau[i] * ev));

        // prediction error
        PE = outcome[i, t] - ev[choice[i, t]];

        // value updating (learning)
        ev[choice[i, t]] += A[i] * PE;
      }
    }
  }
}

I have been reading the manual and the forum, but both at the same time causes a lot more confusion than it clarifies questions. Often I feel like the manual is to sparse explanation in order that I can really understand it. I try to structure my questions below:

Since the model does not fit well, I wanted to include an additional hierarchical layer accounting for the repeated measure. So in theory, the code would loop over all the participants, then over each 3 block of a participant and then over each block of learning within the 16 choices. However I read that one needs to be careful because adding hierarchical layers, that posterior density does not grow without bounds.
If I had now:

parameters{
     Alpha [subject,block] ~ normal( alpha_subj , sigma_alpha_subj_block) ;
     tau [subject,block] ~ normal( tau_subj , sigma_tau_subj_block ) ;

     Alpha_subj [subject] ~ normal(alpha_group,  sigma_alpha_subj) ;
     tau_subj [subject] ~ normal(tau_group, sigma_tau_subj ) ;

     Alpha_group ~ normal( mu_alpha , sigma_alpha );
     tau_group ~ normal( mu_tau , sigma_tau );
}

Is this hierarchy feasible? As I understand it if would be a problem if participants do not learn and choose randomly (alpha=0 / tau=0). How can I tackle this problem?
Can /should I use the same sigma for all parameters since they follow the same distribution? Also I would need to truncate the alpha ranging from 0 to 1 and beta from 0 to around 10.
Do I understand it right, that this is done in the hbayesdm model by the parameter transformation with the phi-approximation? So I would have to transform the new parameters as well?

I am not sure if and how I should limit the parameters. I read in the forum that a truncation through initialization as vector<lower=0, upper=1>[x] does not really limit the parameter since there are two different spaces one for the estimation itself (not limited still moves through the unbounded space) and one for the parameter that is returned (which is limited than). Did I understand it correct?
Does it pose a problem then to transform the parameter for example with tau[i] = Phi_approx(mu_pr[2] + sigma[2] * tau_pr[i]) * 5;?
Is the parameter than also limited in the intern parameter space?
How can I test my hypothesis adequately? I thought I just estimate the parameters and than run classical statistical analysis where my model would be something like:

lme((alpha)~S_O*Admin*Observation+blockN, random= ~S_O|subject, contrasts=list(Admin="contr.sum",Observation="contr.sum",S_O="contr.poly"),
                  corAR1(0,form=~1|subject), data = parameterList_con, method = "ML",na.action=na.exclude, control=list(opt="optim"))

However, it would not be a bayesian way and would not account for the uncertainty and take the point estimates.
So I am wondering, how can I include my hypothesis in the model? I feel overwhelmed because I think I would need to add the whole model structure with random and fixed effects, repeated measure, covariance matrix and noise and and and… This seems too much for me as a beginner.
I also know about the bmrs package, but still I have the reinforcement learning structure that I need to include. Is there an easier way to combine the methods? Can someone recommend me how to move forward? How bad is it to use classical statistics afterwards?
Or can I prove my hypothesis maybe with model comparison?

Model comparison: I could test if the group belonging even makes a difference by running the model once with all data (gorup=1 + 2) and once only for gorup 1 and only for gorup 2. If I now want to compare which model fits better, can I add the logliks for the two separate models? Or would I need to re-write the model to include the groups within the stan code to compare which model fits better? Can I only compare models that take the exactly same data with the loo-package?
I already tried it but I always get the error “Error: Not all models have the same y variable.” - so I guess it’s about the amount of data points?

Best,
Carina

PS: If there are recommendations for tutorials especially for psychologist and reinforcement learning I am glad to know about them.

martinmodrak · July 26, 2020, 3:54pm

So, there is a lot to unpack… I’ll admit I don’t follow all the details, but for the sake of getting you at least some hints sooner rather than later, I’ll start with some surface-level observations and hope they’ll help you move forward. I’ll admit that I’ve never used hbayesdm or built any models like the one you describe…

Yes, that looks bad. Rhat for generated quantities can sometimes be safely ignored - if it is NA, it often means that all posterior samples from all chains have the same value. To what extent this is to be expected and to what extent it may signal a bug is a model-specific question.

Anyway, I would take this one step at a time and first check, whether there isn’t some fundamental problem unrelated to your tweaks to the model - if you take the hbayesdm model as-is (ignoring for a while that it doesn’t capture some aspects of the data), are you able to fit without problems (divergences, treedepth warnings, reasonable Rhat)? If so, are you then able to use the same Stan code to fit the same model yourself? (or maybe that’s what you did already?) Are you able to fit some example dataset from hbayesdm with your custom code?

Yes, that happens, been there some time ago myself… Unfortunately once you move beyond simple stuff, there really is a lot of background knowledge one needs to build up to be able to succesfully run complex models and lot of it is not yet captured in accessible tutorials/classes/… So yes, the first steps can be overwhelming, and Bayesian modelling can be very hard, but I believe you will be able to conquer it :-) Just don’t be afraid to ask.

(I would recommend McElreath’s Statistical Rethinking as a good primer that builds a lot of knowledge from the ground up - there is even an online version of course materials and talks: GitHub - rmcelreath/statrethinking_winter2019: Statistical Rethinking course at MPI-EVA from Dec 2018 through Feb 2019, but I think it still doesn’t cover some of the more advanced stuff. The case studies and courses by Mike Betancourt Writing - betanalpha.github.io do cover some of the more advanced stuff, but can be hard to digest)

It is very rare that you are able to get rid of fitting issues by adding more parameters/structure/… So I would first debug the simpler model. One thing to look at would be the pairs plot for the parameters (possibly subsetting to show only some of the A_pr and tau_pr and keep the plot reasonable). See also Visual MCMC diagnostics using the bayesplot package • bayesplot for some more discussion on using visualisations to debug.

If you have a constrained parameter - e.g. vector<lower=0, upper = 1>[5] x; then all elements of x will be constrained between 0 and 1. Stan creates an unconstrained version of the parameter behind the scenes, but for the most part, you don’t need to worry about it too much.

carinaufer:

How can I test my hypothesis adequately? I thought I just estimate the parameters and than run classical statistical analysis where my model would be something like:
lme((alpha)~S_O*Admin*Observation+blockN, random= ~S_O|subject, contrasts=list(Admin="contr.sum",Observation="contr.sum",S_O="contr.poly"),
                  corAR1(0,form=~1|subject), data = parameterList_con, method = "ML",na.action=na.exclude, control=list(opt="optim"))
However, it would not be a bayesian way and would not account for the uncertainty and take the point estimates.

Hypothesis testing is a bit of an unnatural approach in Bayesian inference (though it can be done). The most natural approach is to inteprepte either the posterior distribution of the parameters, or predictions of the model.

If you want to think about some sort of Bayesian alternative to classical point hypothesis, I’ve written about this elsewhere: Bayesian Hypothesis testing - #15 by martinmodrak

carinaufer · July 26, 2020, 6:45pm

Thank you so much for your answer and the content!

In the meantime, I could achieve reasonable values for Rhat through adjusting the bound for tau to 10 (also in the phi-approximation section) in the stan code from the hbayesdm model but fitted it manually. However, I still ran the model group by group. Largest Rhat is 1.04 but only for one group, but already much better than before and only slightly higher than the recommended < 1.01. Some Rhats are around 1.01_ in other groups.

It made me once again think about constraint parameters and how this is handled with the phi-approximation. Because as I understood it and as you say it stan itself creates an internal unconstrained parameter space. So hence parameter constraints should not have the effect that I observed now (since tau for is estimated around 1.8 - 2.3 for each group)?

Also, there is still a warning about the effective sample size but I read in the coding club tutorial (https://ourcodingclub.github.io/tutorials/stan-intro/) one should start to worry when effective sample size is less than 1/1000th of Nr of iterations or ‘Anything over an n_eff of 100 is usually “fine”’ - Bob Carpenter. I used 2000 and the lowest n_eff is around 100 for some parameters in the very group where the Rhat values were also higher than in the others.

Moreover, I came to inspect the cases now where Rhat was NaN for the predictions. And indeed as you say those are choices where the prediction for the decisions where choice is pretty clear (e.g. mostly at the end of a trial block where the participant learned which option is more rewarding and is sure about the decision. So actually, it’s a good sign).

I also already watched the statistical rethinking lectures by McElreath, as it is recommended basically everywhere here in the forum. Thanks for the hint with the case studies by Mike Betancourt, I will read into that.

Thanks for the link to the hypothesis testing. I know that it’s about comparing which model fits better and not about assuming a H0 but still I need a comparison model somehow.

For example, it might be likely that the group where Rhats were high now, exhibits rather random choices because they were administered a drug and were not observed during the task). Hence a 0-learning rate model would maybe fit better with this subset of data while for the other groups the reinforcement learning is more accurate. How would that be possible within one model? Moreover, I am wondering how I can calculate a likelihood / implement a model for random choices because there would be no parameter to estimate…?

Vanessa_Brown · July 27, 2020, 2:38am

Hi Carina,

all of @martinmodrak’s advice is good. I do similar work in RL and I will post some more specific feedback tomorrow when I have more time, but a couple questions -

could you explain your experimental design a bit more? You say you have 8 groups and people complete the task 3 times at two timepoints. What are the groups and when you say people complete the task three times, is that three blocks within one session or three sessions (if the latter, is there some difference between the sessions)?
how many trials do you have per session? If it’s 16, that’s not enough for a standard RL task and you may need to tweak the model to handle that.
you will want to test your hypotheses within the model, not pull out a parameter estimate per person. Especially if you have a limited number of trials, there will be large differences in the analysis approaches (see https://link.springer.com/article/10.3758/s13428-018-1054-3 and https://www.sciencedirect.com/science/article/pii/S2451902220300161 for why this is & how much it affects estimates)

best,
Vanessa

Guido_Biele · July 27, 2020, 8:46am

Hi,
@martinmodrak and @Vanessa_Brown have already given useful feedback. Here are two additional points you could look at:

tau (and all parameters/priors that contribute to it’s calculation) should be seen/investigated in the context of the scale of you outcome values. It makes a huge difference if the outcomes are in the range 0-100 or 0-1. I would do a prior predictive check and look at the distributions of choice probabilities the model makes.

If the choice probabilities are either close to 0 or 1 throughout, that’s a clear sign that priors and data do not play well together.

I would also look at the generated learning rates. Looking at these parameters in a prior predictive check is especially important, because the use of phi_approx makes it harder to have an intuition about the effect of the priors.

More prosaically: I agree with @Vanessa_Brown that 16 trials per task would be relatively little data. I would therefore also check if you can identify signs of learning without modeling. (e.g. probability to choose better options increases over time at the group and maybe also at the individual level)

carinaufer · July 27, 2020, 9:31am

Thank you @Vanessa_Brown and @Guido_Biele!

So the RL paradigm was adopted from https://doi.org/10.1073/pnas.1603198113 and followed a 25/75 reward probability. They found that participants learned to choose the better option after around 7 choices. I already plotted the mean correct choices as you suggest @Guido_Biele and it on group level as on participant level the mean to choose the better option increases to >75% after first 5 choices - which equals the true probability. But taking the mean correct choices as a dependent variable in frequentist analyses did not yield any significant results.

To the experimental design: participants completed 6 blocks à 16 choices at the same time point. As in the aforementioned paper they found a difference in learning for oneself or for others thus we also adopted this. It was a repeated measure, i.e. participants alternately performing one block gaining rewards for themselves or others. Then as between variation there were 2 more factors either being observed during the task or not, and drug administration or placebo.
We expected an interaction effect between gaining rewards for others x observation x drug.

Thank you for the links to the papers @Vanessa_Brown. I read through them briefly (need to take more time later). What I currently have implemented is an option not discussed there because I have trial level and group level.
I am lacking the participant level (repeated measure) and I know that I need to add that, and also the different group levels in one model. However I don’t know how to do it. Also I am clueless about how to approach then the next step of hierarchical bayesian test because it’s not a simple t-test.
My thought was that as I first step I could compare the full model (different group level parameters for each group) against a model with random choices. But then I don’t know how to implement and run a model assuming random choice - like it would be 50% chance choosing each option - but there would not be a parameter to estimate…?

@Guido_Biele I am already working on the PPC, however I am still struggling with it’s extraction and also because it is a discrete prediction, so either 1 or 2, so most of what I find to generated predictions does not work for the kind of data I get back.
Right now I just left the options as they were in the task (sometimes 1 being better sometimes 2). But I guess I need to transform them in a way that one option always represents the ‘more rewarding option’, else I am not able to trace the learning rate in the PPC. Is there a best practice, for example is it recommended to go with 0 and 1? Does it make any difference?

Vanessa_Brown · July 27, 2020, 4:59pm

Hi Carina,

for the poor model fits (high Rhat): it sounds like you have three blocks per condition, two conditions per session, and repeated sessions per person (where people are observed/not observed and given drug/placebo), correct? Are you modeling all three blocks in a condition together? That would be the correct way to do things if I understand your design right, and would give you 48 trials per condition which is probably enough.

I would recommend moving away from the phi_approx (I think hbayesdm carried that over from JAGS), especially for your inverse temperature parameter. There’s no reason for a transformation or upper bound on that parameter, as long as you set a lower bound (using <lower=0>). I use inv_logit to constrain learning rates but phi_approx may work ok for that parameter. I agree with the recommendation to use prior predictive checks to check the range of your priors.
in your code, the choices are from -1 to 2, and you use a categorical_logit to predict them. If you only have two choices per block, you can use bernoulli_logit and have choices indicated as 0 or 1. What does the -1 to 2 coding for your choices mean now?
it sounds like overall, participants are learning well. Is that the case for most participants? in my experience, hierarchical modeling can handle some poor learners fine, but if it gets to be many people, that may start causing issues.

comparison to null model: I’m not sure what the paper you linked to was doing, but a null/chance model is just assuming that the chance of each choice is 50%, right? If you want to do model comparison with BIC (though other methods can be used with stan), you just need the number of trials (which is constant), the number of parameters, and the likelihood per trial. For a null model, the number of parameters is 0 and the likelihood per trial is 0.5. So, once you calculate that, you can compare that to the BIC you get from a model with parameters. Note that the fitted models give you a LL per post-warmup iteration so you will want to average over those.

estimating effects within and across sessions: This is going to be tricky. I agree with the above recommendations that you get models for each condition figured out first, as those’ll need to be in good shape before you can start looking at group differences. Below I’ve pasted code for a model I’m working with (different task - this is Daw’s two-step model-based/model-free task) that incorporates repeated measures and within- and across-subject effects. You’ll still need to add in the within-condition effects but hopefully this gives you a starting point. I’ve checked this code pretty thoroughly, but noone else really has, so use at your own risk :)

data {
  int<lower=1> nT; //trials per visit
  int<lower=1> nS; //# of subjects
  int<lower=2> nV; //# of visits per subject
  int<lower=0,upper=1> choice[nS,nT,2,nV]; 
  int<lower=0,upper=1> reward[nS,nT,nV];
  int<lower=1,upper=2> state_2[nS,nT,nV]; 
  int missing_choice[nS,nT,2,nV];
  int missing_reward[nS,nT,nV];
  int kS; //# of subj-level variables (L)
  real subj_vars[nS,kS]; //subj-level variable matrix (centered)
  int kV; //# of visit-level variables (K)
  real visit_vars[nS,nV,kV]; //visit-level variable matrix (centered) (X)
  real missing_visit[nS,nV];
}

parameters {
  //group-level means (y00)
  real alpha_m;
  real<lower=0> beta_1_MF_m;
  real<lower=0> beta_1_MB_m;
  real<lower=0> beta_2_m;
  real pers_m;
  
  //subj-level effects (y01:yks)
  vector[kS] alpha_grp_m;
  vector[kS] beta_1_MF_grp_m;
  vector[kS] beta_1_MB_grp_m;
  vector[kS] beta_2_grp_m;
  vector[kS] pers_grp_m;
  
  //visit-level effects - mean (y10:ykv0)
  vector[kV] alpha_visit_grp_m;
  vector[kV] beta_1_MF_visit_grp_m;
  vector[kV] beta_1_MB_visit_grp_m;
  vector[kV] beta_2_visit_grp_m;
  vector[kV] pers_visit_grp_m;
  
  //cross-level interaction effects (y11:ykvks)
  matrix[kV,kS] alpha_int_m;
  matrix[kV,kS] beta_1_MF_int_m;
  matrix[kV,kS] beta_1_MB_int_m;
  matrix[kV,kS] beta_2_int_m;
  matrix[kV,kS] pers_int_m;
  
  //visit-level (within subject) SDs (sigma2_y)
  real<lower=0> alpha_visit_s;
  real<lower=0> beta_1_MF_visit_s;
  real<lower=0> beta_1_MB_visit_s;
  real<lower=0> beta_2_visit_s;
  real<lower=0> pers_visit_s;
  
  //SDs of visit-level effects
  vector<lower=0>[kV+1] alpha_subj_s;
  vector<lower=0>[kV+1] beta_1_MF_subj_s;
  vector<lower=0>[kV+1] beta_1_MB_subj_s;
  vector<lower=0>[kV+1] beta_2_subj_s;
  vector<lower=0>[kV+1] pers_subj_s;
  
  //non-centered parameterization (ncp) variance effect per visit & subject
  matrix[nS,nV] alpha_visit_raw;
  matrix[nS,nV] beta_1_MF_visit_raw;
  matrix[nS,nV] beta_1_MB_visit_raw;
  matrix[nS,nV] beta_2_visit_raw;
  matrix[nS,nV] pers_visit_raw;
  
  //NCP variance effect on subj-level effects
  matrix[kV+1,nS] alpha_subj_raw;
  matrix[kV+1,nS] beta_1_MF_subj_raw;
  matrix[kV+1,nS] beta_1_MB_subj_raw;
  matrix[kV+1,nS] beta_2_subj_raw;
  matrix[kV+1,nS] pers_subj_raw;
  
  //Cholesky factors of correlation matrices for subj-level variances
  cholesky_factor_corr[kV+1] alpha_subj_L;
  cholesky_factor_corr[kV+1] beta_1_MF_subj_L;
  cholesky_factor_corr[kV+1] beta_1_MB_subj_L;
  cholesky_factor_corr[kV+1] beta_2_subj_L;
  cholesky_factor_corr[kV+1] pers_subj_L;
}

transformed parameters {
  //define transformed parameters
  matrix<lower=0,upper=1>[nS,nV] alpha;
  matrix[nS,nV] alpha_normal;
  matrix[nS,nV] beta_1_MF;
  matrix[nS,nV] beta_1_MB;
  matrix[nS,nV] beta_2;
  matrix[nS,nV] pers;
  
  //convert Cholesky factorized correlation matrix into SDs per visit-level effect
  matrix[nS,kV+1] alpha_vars = (diag_pre_multiply(alpha_subj_s,alpha_subj_L)*alpha_subj_raw)';
  matrix[nS,kV+1] beta_1_MF_vars = (diag_pre_multiply(beta_1_MF_subj_s,
    beta_1_MF_subj_L)*beta_1_MF_subj_raw)';
  matrix[nS,kV+1] beta_1_MB_vars = (diag_pre_multiply(beta_1_MB_subj_s,
    beta_1_MB_subj_L)*beta_1_MB_subj_raw)';
  matrix[nS,kV+1] beta_2_vars = (diag_pre_multiply(beta_2_subj_s,
    beta_2_subj_L)*beta_2_subj_raw)';
  matrix[nS,kV+1] pers_vars = (diag_pre_multiply(pers_subj_s,pers_subj_L)*pers_subj_raw)';
  
  //create transformed parameters using non-centered parameterization for all
  // and logistic transformation for alpha (range: 0 to 1),
  // add in subject and visit-level effects as shifts in means
  for (s in 1:nS) {
    alpha_normal[s,] = alpha_m+alpha_visit_s*alpha_visit_raw[s,] + alpha_vars[s,1]; 
    beta_1_MF[s,] = beta_1_MF_m + beta_1_MF_visit_s*beta_1_MF_visit_raw[s,] + 
      beta_1_MF_vars[s,1];
    beta_1_MB[s,] = beta_1_MB_m + beta_1_MB_visit_s*beta_1_MB_visit_raw[s,] + 
      beta_1_MB_vars[s,1];
    beta_2[s,] = beta_2_m + beta_2_visit_s*beta_2_visit_raw[s,] + beta_2_vars[s,1];
    pers[s,] = pers_m + pers_visit_s*pers_visit_raw[s,] + pers_vars[s,1];
    
    // //add subj-level effects
    // for (k in 1:kS) { 
    //   alpha_normal[s,] += alpha_grp_m[k]*subj_vars[s,k];
    //   beta_1_MF[s,] += beta_1_MF_grp_m[k]*subj_vars[s,k];
    //   beta_1_MB[s,] += beta_1_MB_grp_m[k]*subj_vars[s,k];
    //   beta_2[s,] += beta_2_grp_m[k]*subj_vars[s,k];
    //   pers[s,] += pers_grp_m[k]*subj_vars[s,k];
    // }
  
    //add subj- and visit-level effects
    for (v in 1:nV) {
      if (missing_visit[s,v]==0) {
      for (kk in 1:kV) { 
        //main effects of visit-level variables
        alpha_normal[s,v] += visit_vars[s,v,kk]*(alpha_visit_grp_m[kk]+alpha_vars[s,kk+1]);
        beta_1_MF[s,v] += visit_vars[s,v,kk]*(beta_1_MF_visit_grp_m[kk]+beta_1_MF_vars[s,kk+1]);
        beta_1_MB[s,v] += visit_vars[s,v,kk]*(beta_1_MB_visit_grp_m[kk]+beta_1_MB_vars[s,kk+1]);
        beta_2[s,v] += visit_vars[s,v,kk]*(beta_2_visit_grp_m[kk]+beta_2_vars[s,kk+1]);
        pers[s,v] += visit_vars[s,v,kk]*(pers_visit_grp_m[kk]+pers_vars[s,kk+1]);
          for (k in 1:kS) { 
            //main effects of subject-level variables
            alpha_normal[s,v] += subj_vars[s,k]*alpha_grp_m[k];
            beta_1_MF[s,v] += beta_1_MF_grp_m[k]*subj_vars[s,k];
            beta_1_MB[s,v] += beta_1_MB_grp_m[k]*subj_vars[s,k];
            beta_2[s,v] += beta_2_grp_m[k]*subj_vars[s,k];
             pers[s,v] += pers_grp_m[k]*subj_vars[s,k];
            
            //cross-level interactions
            alpha_normal[s,v] += subj_vars[s,k]*visit_vars[s,v,kk]*alpha_int_m[kk,k];
            beta_1_MF[s,v] += subj_vars[s,k]*visit_vars[s,v,kk]*beta_1_MF_int_m[kk,k];
            beta_1_MB[s,v] += subj_vars[s,k]*visit_vars[s,v,kk]*beta_1_MB_int_m[kk,k];
            beta_2[s,v] += subj_vars[s,k]*visit_vars[s,v,kk]*beta_2_int_m[kk,k];
            pers[s,v] += subj_vars[s,k]*visit_vars[s,v,kk]*pers_int_m[kk,k];
          }
      }
      }
    }
    //transform alpha to [0,1]
    alpha[s,] = inv_logit(alpha_normal[s,]); 
  }
}

model {
  //define variables
  int prev_choice;
  int tran_count;
  int tran_type[2];
  int unc_state;
  // real delta_1;
  // real delta_2;
  real Q_TD[2];
  real Q_MB[2];
  real Q_2[2,2];
  
  //define priors
  alpha_m ~ normal(0,2.5);
  beta_1_MF_m ~ normal(0,5);
  beta_1_MB_m ~ normal(0,5);
  beta_2_m ~ normal(0,5);
  pers_m ~ normal(0,2.5);
  
  alpha_grp_m ~ normal(0,1);
  beta_1_MF_grp_m ~ normal(0,1);
  beta_1_MB_grp_m ~ normal(0,1);
  beta_2_grp_m ~ normal(0,1);
  pers_grp_m ~ normal(0,1);
  
  alpha_visit_grp_m ~ normal(0,1);
  beta_1_MF_visit_grp_m ~ normal(0,1);
  beta_1_MB_visit_grp_m ~ normal(0,1);
  beta_2_visit_grp_m ~ normal(0,1);
  pers_visit_grp_m ~ normal(0,1);
  
  for (k in 1:kS) {
    alpha_int_m[,k] ~ normal(0,1);
    beta_1_MF_int_m[,k] ~ normal(0,1);
    beta_1_MB_int_m[,k] ~ normal(0,1);
    beta_2_int_m[,k] ~ normal(0,1);
    pers_int_m[,k] ~ normal(0,1);
  }
  
  alpha_visit_s ~ cauchy(0,2);
  beta_1_MF_visit_s ~ cauchy(0,2);
  beta_1_MB_visit_s ~ cauchy(0,2);
  beta_2_visit_s ~ cauchy(0,2);
  pers_visit_s ~ cauchy(0,2);
  
  for (s in 1:nS) {
    alpha_visit_raw[s,] ~ normal(0,1);
    beta_1_MF_visit_raw[s,] ~ normal(0,1);
    beta_1_MB_visit_raw[s,] ~ normal(0,1);
    beta_2_visit_raw[s,] ~ normal(0,1);
    pers_visit_raw[s,] ~ normal(0,1);
    
    to_vector(alpha_subj_raw[,s]) ~ normal(0,1);
    to_vector(beta_1_MF_subj_raw[,s]) ~ normal(0,1);
    to_vector(beta_1_MB_subj_raw[,s]) ~ normal(0,1);
    to_vector(beta_2_subj_raw[,s]) ~ normal(0,1);
    to_vector(pers_subj_raw[,s]) ~ normal(0,1);
  }
  
  alpha_subj_s ~ student_t(3,0,2);
  beta_1_MF_subj_s ~ student_t(3,0,3);
  beta_1_MB_subj_s ~ student_t(3,0,3);
  beta_2_subj_s ~ student_t(3,0,3);
  pers_subj_s ~ student_t(3,0,2);
  
  alpha_subj_L ~ lkj_corr_cholesky(1);
  beta_1_MF_subj_L ~ lkj_corr_cholesky(1);
  beta_1_MB_subj_L ~ lkj_corr_cholesky(1);
  beta_2_subj_L ~ lkj_corr_cholesky(1);
  pers_subj_L ~ lkj_corr_cholesky(1);
  
  for (s in 1:nS) {
    for (v in 1:nV) {
      
      //set initial values
      for (i in 1:2) {
        Q_TD[i]=.5;
        Q_MB[i]=.5;
        Q_2[1,i]=.5;
        Q_2[2,i]=.5;
        tran_type[i]=0;
      }
      prev_choice=0;
      
      for (t in 1:nT) {
        //use if not missing 1st stage choice
        if (missing_choice[s,t,1,v]==0) {
          choice[s,t,1,v] ~ bernoulli_logit(beta_1_MF[s,v]*(Q_TD[2]-Q_TD[1])
          +beta_1_MB[s,v]*(Q_MB[2]-Q_MB[1])+pers[s,v]*prev_choice);
          prev_choice = 2*choice[s,t,1,v]-1; //1 if choice 2, -1 if choice 1
          
          //use if not missing 2nd stage choice
          if (missing_choice[s,t,2,v]==0) {
            choice[s,t,2,v] ~ bernoulli_logit(beta_2[s,v]*
              (Q_2[state_2[s,t,v],2]-Q_2[state_2[s,t,v],1]));
            
            //use if not missing 2nd stage reward
            if (missing_reward[s,t,v]==0) {
              // //prediction errors
              // //note: choices are 0/1, +1 to make them 1/2 for indexing
              // delta_1 = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]/alpha[s,v]-
              //   Q_TD[choice[s,t,1,v]+1]; 
              // delta_2 = reward[s,t,v]/alpha[s,v] - Q_2[state_2[s,t,v],choice[s,t,2,v]+1];
              
              //update transition counts: if choice=0 & state=1, or choice=1 & state=2, 
              //update 1st expectation of transition, otherwise update 2nd expectation
              tran_count = (state_2[s,t,v]-choice[s,t,1,v]-1) ? 2 : 1;
              tran_type[tran_count] = tran_type[tran_count] + 1;
              
              //update chosen values
              //Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1] + 
              //  alpha[s,v]*(delta_1+lambda[s,v]*delta_2);
              Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1]*(1-(alpha[s,v])) + reward[s,t,v];
                //+ alpha[s,v]*delta_1+delta_2;
              Q_2[state_2[s,t,v],choice[s,t,2,v]+1] = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]*
                (1 -(alpha[s,v])) + reward[s,t,v];
              // + alpha[s,v]*delta_2;
  
              //update unchosen TD & second stage values
              Q_TD[(choice[s,t,1,v] ? 1 : 2)] = (1-alpha[s,v])*
                Q_TD[(choice[s,t,1,v] ? 1 : 2)];
              Q_2[state_2[s,t,v],(choice[s,t,2,v] ? 1 : 2)] = (1-alpha[s,v])*
                Q_2[state_2[s,t,v],(choice[s,t,2,v] ? 1 : 2)];
              unc_state = (state_2[s,t,v]-1) ? 1 : 2;
              Q_2[unc_state,1] = (1-alpha[s,v])*Q_2[unc_state,1];
              Q_2[unc_state,2] = (1-alpha[s,v])*Q_2[unc_state,2];
              
              //update model-based values
              Q_MB[1] = (tran_type[1] > tran_type[2]) ? (.7*fmax(Q_2[1,1],Q_2[1,2]) + 
                .3*fmax(Q_2[2,1],Q_2[2,2])) : (.3*fmax(Q_2[1,1],Q_2[1,2]) + 
                  .7*fmax(Q_2[2,1],Q_2[2,2]));
              Q_MB[2] = (tran_type[1] > tran_type[2]) ? (.3*fmax(Q_2[1,1],Q_2[1,2]) + 
                .7*fmax(Q_2[2,1],Q_2[2,2])) : (.7*fmax(Q_2[1,1],Q_2[1,2]) + 
                  .3*fmax(Q_2[2,1],Q_2[2,2]));
              
            } //if missing 2nd stage reward: do nothing
            //if missing 2nd stage choice or reward: still update 1st stage TD values, 
            //decay 2nd stage values
          } else if (missing_choice[s,t,2,v]==1||missing_reward[s,t,v]==1) { 
          // delta_1 = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]-Q_TD[choice[s,t,1,v]+1];
          Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1]*(1-(alpha[s,v])) + reward[s,t,v];
          //+ alpha[s,v]*delta_1;
          Q_TD[(choice[s,t,1,v] ? 1 : 2)] = (1-alpha[s,v])*Q_TD[(choice[s,t,1,v] ? 1 : 2)];
          Q_2[1,1] = (1-alpha[s,v])*Q_2[1,1];
          Q_2[1,2] = (1-alpha[s,v])*Q_2[1,2];
          Q_2[2,1] = (1-alpha[s,v])*Q_2[2,1];
          Q_2[2,2] = (1-alpha[s,v])*Q_2[2,2];
          //MB update of first stage values based on second stage values, so don't change
          
          }
        } else { //if missing 1st stage choice: decay all TD & 2nd stage values & 
        //update previous choice
        prev_choice=0;
        Q_TD[1] = (1-alpha[s,v])*Q_TD[1];
        Q_TD[2] = (1-alpha[s,v])*Q_TD[2];
        Q_2[1,1] = (1-alpha[s,v])*Q_2[1,1];
        Q_2[1,2] = (1-alpha[s,v])*Q_2[1,2];
        Q_2[2,1] = (1-alpha[s,v])*Q_2[2,1];
        Q_2[2,2] = (1-alpha[s,v])*Q_2[2,2];
        }
      }
    }
  }
}

generated quantities {
  //same code as above, with following changes: 
  // 1) values and choices used to calculate probability, rather than fitting values to choices
  // 2) no priors, etc.- uses estimated pararamter values from model block
  
  real log_lik[nS,nT,2,nV]; //log likelihood- must be named this
  int prev_choice;
  int tran_count;
  int tran_type[2];
  int unc_state;
  // real delta_1;
  // real delta_2;
  real Q_TD[2];
  real Q_MB[2];
  real Q_2[2,2];
  
  corr_matrix[kV+1] alpha_cor = multiply_lower_tri_self_transpose(alpha_subj_L);
  corr_matrix[kV+1] beta_1_MF_cor = 
    multiply_lower_tri_self_transpose(beta_1_MF_subj_L);
  corr_matrix[kV+1] beta_1_MB_cor = 
    multiply_lower_tri_self_transpose(beta_1_MB_subj_L);
  corr_matrix[kV+1] beta_2_cor = 
    multiply_lower_tri_self_transpose(beta_2_subj_L);
  corr_matrix[kV+1] pers_cor = 
    multiply_lower_tri_self_transpose(pers_subj_L);
  
  for (s in 1:nS) {
    for (v in 1:nV) {
      for (i in 1:2) {
        Q_TD[i]=.5;
        Q_MB[i]=.5;
        Q_2[1,i]=.5;
        Q_2[2,i]=.5;
        tran_type[i]=0;
      }
      prev_choice=0;
      for (t in 1:nT) {
        if (missing_choice[s,t,1,v]==0) {
          log_lik[s,t,1,v] = bernoulli_logit_lpmf(choice[s,t,1,v] | beta_1_MF[s,v]*
            (Q_TD[2]-Q_TD[1])+beta_1_MB[s,v]*(Q_MB[2]-Q_MB[1])+pers[s,v]*prev_choice);
          prev_choice = 2*choice[s,t,1,v]-1; //1 if choice 2, -1 if choice 1
          
          if (missing_choice[s,t,2,v]==0) {
            log_lik[s,t,2,v] = bernoulli_logit_lpmf(choice[s,t,2,v] | beta_2[s,v]*
              (Q_2[state_2[s,t,v],2]-Q_2[state_2[s,t,v],1]));
            
            //use if not missing 2nd stage reward
            if (missing_reward[s,t,v]==0) {
              // //prediction errors
              // //note: choices are 0/1, +1 to make them 1/2 for indexing
              // delta_1 = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]/alpha[s,v]-
              //   Q_TD[choice[s,t,1,v]+1];
              // delta_2 = reward[s,t,v]/alpha[s,v] - Q_2[state_2[s,t,v],choice[s,t,2,v]+1];
              
              //update transition counts: if choice=0 & state=1, or choice=1 & state=2, 
              //update 1st expectation of transition, otherwise update 2nd expectation
              tran_count = (state_2[s,t,v]-choice[s,t,1,v]-1) ? 2 : 1;
              tran_type[tran_count] = tran_type[tran_count] + 1;
              
              //update chosen values
              //Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1] + alpha[s,v]*
              //  (delta_1+lambda[s,v]*delta_2);
              Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1]*(1-(alpha[s,v])) + reward[s,t,v];
                //+ alpha[s,v]*delta_1+delta_2;
              Q_2[state_2[s,t,v],choice[s,t,2,v]+1] = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]*
                (1 -(alpha[s,v])) + reward[s,t,v];
              // + alpha[s,v]*delta_2;
              
              //update unchosen TD & second stage values
              Q_TD[(choice[s,t,1,v] ? 1 : 2)] = (1-alpha[s,v])*Q_TD[(choice[s,t,1,v] ? 1 : 2)];
              Q_2[state_2[s,t,v],(choice[s,t,2,v] ? 1 : 2)] = (1-alpha[s,v])*
                Q_2[state_2[s,t,v],(choice[s,t,2,v] ? 1 : 2)];
              unc_state = (state_2[s,t,v]-1) ? 1 : 2;
              Q_2[unc_state,1] = (1-alpha[s,v])*Q_2[unc_state,1];
              Q_2[unc_state,2] = (1-alpha[s,v])*Q_2[unc_state,2];
              
              //update model-based values
              Q_MB[1] = (tran_type[1] > tran_type[2]) ? (.7*fmax(Q_2[1,1],Q_2[1,2]) + 
                .3*fmax(Q_2[2,1],Q_2[2,2])) : (.3*fmax(Q_2[1,1],Q_2[1,2]) + 
                  .7*fmax(Q_2[2,1],Q_2[2,2]));
              Q_MB[2] = (tran_type[1] > tran_type[2]) ? (.3*fmax(Q_2[1,1],Q_2[1,2]) + 
                .7*fmax(Q_2[2,1],Q_2[2,2])) : (.7*fmax(Q_2[1,1],Q_2[1,2]) + 
                  .3*fmax(Q_2[2,1],Q_2[2,2]));
              
            } //if missing 2nd stage reward: do nothing
           //if missing 2nd stage choice or reward: still update 1st stage TD values, 
           //decay 2nd stage values 
          } else if (missing_choice[s,t,2,v]==1||missing_reward[s,t,v]==1) { 
          log_lik[s,t,2,v] = 0;
          // delta_1 = Q_2[state_2[s,t,v],choice[s,t,2,v]+1]-Q_TD[choice[s,t,1,v]+1]; 
          Q_TD[choice[s,t,1,v]+1] = Q_TD[choice[s,t,1,v]+1]*(1-(alpha[s,v])) + reward[s,t,v];
          //+ alpha[s,v]*delta_1;
          Q_TD[(choice[s,t,1,v] ? 1 : 2)] = (1-alpha[s,v])*Q_TD[(choice[s,t,1,v] ? 1 : 2)];
          Q_2[1,1] = (1-alpha[s,v])*Q_2[1,1];
          Q_2[1,2] = (1-alpha[s,v])*Q_2[1,2];
          Q_2[2,1] = (1-alpha[s,v])*Q_2[2,1];
          Q_2[2,2] = (1-alpha[s,v])*Q_2[2,2];
          //MB update of first stage values based on second stage values, so don't change
          
          }
        } else { //if missing 1st stage choice: decay all TD & 2nd stage values & 
         //update previous choice
        prev_choice=0;
        log_lik[s,t,1,v] = 0;
        log_lik[s,t,2,v] = 0;
        Q_TD[1] = (1-alpha[s,v])*Q_TD[1];
        Q_TD[2] = (1-alpha[s,v])*Q_TD[2];
        Q_2[1,1] = (1-alpha[s,v])*Q_2[1,1];
        Q_2[1,2] = (1-alpha[s,v])*Q_2[1,2];
        Q_2[2,1] = (1-alpha[s,v])*Q_2[2,1];
        Q_2[2,2] = (1-alpha[s,v])*Q_2[2,2];
        }
      } 
    }
  }
}

general resources: below are a bunch of things I’ve found useful. most are not stan-specific, and are at different levels depending on where you’re at in terms of learning modeling.

Ten simple rules for computational modeling of behavioral data
Video tutorial on modeling behavioral data
Computational psychiatry tutorial on modeling individual differences
Using reinforcement learning models in social neuroscience
Trial-by-trial data analysis using computational models note that you can download this from here
Basic modeling tutorial

best of luck,
Vanessa

Guido_Biele · July 28, 2020, 12:45pm

First, i think as @Vanessa_Brown’s post also shows, getting the results you want will probably not be possible by starting with a hdmbayes model and making some adjustments to that. I think the most efficient way is to start with a simple model RL model and successively add complexity. (You could e.g. start with a simple model that assumes all data come from one condition and one participant, the you could add participant-random effects for learning rate and temperature, then you could add fixed effects for conditions, fixed effects for drug, interaction effects, and even random effects of condition/drug…)

With adding complexity I mean that you can think of adding information about task conditions and individuals as modeling these with regression models where you can use the inv_logit link function for the learning rate and the log link function for the inverse temperature. (You can also, after setting appropriate parameter constraints, try the identity link function first. This makes having an intuition about the effect of priors much easier, but it will only work when the estimated parameters are not close to parameter boundaries).

The Stan documentation has a clear example for a hierarchical logistic regression, which shouød be a good template to develop a hierarchical modell og learning rates.

Regarding the basic RL model: I agree with @Vanessa_Brown that a simple binomial model works well here. For this model you would have to generate trial by trial choice probabilities, which is what I would use for the PPC. I.e. you can recalculate those on the generated quantities block and investigate them later. That’s a lot of numbers, so I’d do this only for a select number of participants.

Regarding testing: if you are using Stan, I would use loo or bridgesampling (both R packages) for model comparison. Loo will do approximate leave one out cross validation, and bridgesampling will calculate Bayes factors. The drawback with loo is that leave one out in it’s basic form is invalid here because trials are not independent (but you can leave out participants, that would be OK, but it is a bit harder to set up). The drawback with bridgesampling is that you’ll need many post-warmup samples.

An alternative approach is to fit the most complex model and use the ROPE approach to check if the parameters of interest (e.g. learning rate on/off drug) are for practical purposes different from zero.

Topic		Replies	Views
Issues with running Bayesian hierarchical reinforcement learning model in Stan Modeling rstan , fitting-issues , specification , performance	12	821	June 28, 2021
Difficulties in Capturing Individual Differences with a Hierarchical Model and Balancing Priors in Stan Modeling rstan , fitting-issues , reinforcement-learning	3	104	January 27, 2025
Low effective sample size after running Bayesian cognitive model in Stan Modeling rstan , fitting-issues	8	782	August 18, 2021
Two-armed bandit hierarchical reinforcement learning model - interpreting conflicting loo and posterior predictive check results Modeling specification , loo , posterior-predictive , hierarchical-model , reinforcement-learning	7	643	January 17, 2024
Two-armed bandit hierarchical reinforcement learning model - simplify to random intercept only Modeling rstan , specification , mixed-model , reinforcement-learning	3	469	February 12, 2024

Repeated measure hierarchical reinforcement learning with groups (2 x 2 x 2 design)

Related topics