Issues with Bayesian Hierarchical Reinforcement Learning Model Fitting

Hello everyone, I am a beginner in computational modeling and I am trying to fit a Bayesian hierarchical reinforcement learning model (Pearce-Hall model) to my behavioral data. However, I am encountering some fitting issues, including divergent transitions, high R-hat values, and extremely low effective sample size (ESS). I would appreciate any advice or guidance on how to resolve these problems.

1.Overview of My Research Design:
In my experiment, participants are required to make choices in a probabilistic reversal learning task. Each participant completed 180 trials. In each trial, participants are presented with two stimuli and choose one. Based on these data, I am using a Bayesian hierarchical Pearce-Hall reinforcement learning model for fitting.

  1. Solutions I’ve Tried:
    I have increased the number of sampling iterations.
    I have adjusted adapt_delta and the step size.
    I have increased the number of warm-up iterations, but the issues remain.
data {
  int<lower=1> nSubjects;
  int<lower=1> nTrials;
  int<lower=1,upper=2> choice[nSubjects, nTrials];     
  real<lower=0, upper=10> reward[nSubjects, nTrials]; 
}

transformed data {
  vector[2] initV;  // initial values for V
  initV = rep_vector(0.5, 2);
  real initaPE;
  initaPE = 0.5;
}

parameters {

 // Hyper(group)-parameters
  vector[4] mu_pr;
  vector<lower=0>[4] sigma;
  
  // Subject-level raw parameters (for Matt trick)
  vector[nSubjects] alpha_pr;    // learning rate
  vector[nSubjects] gamma_pr;    // 
  vector[nSubjects] c_pr;    // 
  vector[nSubjects] tau_pr; // inverse temperature
 
}

transformed parameters {
  // subject-level parameters
  vector<lower=0, upper=1>[nSubjects] alpha;
  vector<lower=0, upper=1>[nSubjects] gamma;
  vector<lower=0, upper=1>[nSubjects] c;
  vector<lower=0, upper=20>[nSubjects] tau;


  for (i in 1:nSubjects) {
    alpha[i]   = Phi_approx(mu_pr[1]  + sigma[1]  * alpha_pr[i]);
    gamma[i]   = Phi_approx(mu_pr[2]  + sigma[2]  * gamma_pr[i]);
    c[i]   = Phi_approx(mu_pr[3]  + sigma[3]  * c_pr[i]);
    tau[i] = Phi_approx(mu_pr[4] + sigma[4] * tau_pr[i]) * 20;
  }
}

model {
  
  // Hyperparameters
  mu_pr[1]  ~ normal(0, 1);
  mu_pr[2]  ~ normal(0, 1);
  mu_pr[3]  ~ normal(0, 1);
  mu_pr[4]  ~ normal(0, 5);
  sigma[1] ~ cauchy(0,1);
  sigma[2] ~ cauchy(0,1);
  sigma[3] ~ cauchy(0,1);
  sigma[4] ~ cauchy(0,5);
  
  // individual parameters
  alpha_pr ~ normal(0, 1.0);
  gamma_pr ~ normal(0, 1.0);
  c_pr ~ normal(0, 1.0);
  tau_pr ~ normal(0, 1.0);


   for (s in 1:nSubjects) {
    vector[2] v;    // value estimates
    real pe;        // prediction error
    real k;     
    v = initV;      // initialize value estimates
    k = alpha[s];
    pe = initaPE;
    

    for (t in 1:nTrials) {        
      choice[s,t] ~ categorical_logit( tau[s] * v );  // choice made based on softmax
      
      // Compute prediction error
      pe = reward[s,t] - v[choice[s,t]];
      
      // Update learning rate using Pearce-Hall rule
      k = gamma[s] * fabs(pe)*c[s] + (1 - gamma[s]) * k; 
      
      // Update value estimates
      v[choice[s,t]] = v[choice[s,t]] +  k * pe; 
      
      }
    }
  }
  
generated quantities {

  real log_lik[nSubjects, nTrials];
  int y_pred[nSubjects, nTrials];
  
  y_pred = rep_array(-999, nSubjects, nTrials);  // Initialize prediction array

  { // local section, this saves time and space
    for (s in 1:nSubjects) {
      vector[2] v;    // value estimates
      real pe;        // prediction error
      real k;     // dynamic learning rate
      v = initV;
      k = alpha[s];

      for (t in 1:nTrials) {    
        log_lik[s,t] = categorical_logit_lpmf(choice[s,t] | tau[s] * v);    
        y_pred[s,t] = categorical_logit_rng( tau[s] * v ); 
        
        // Compute prediction error
        pe = reward[s,t] - v[choice[s,t]];

        // Update learning rate using Pearce-Hall rule
        k = gamma[s] * fabs(pe)*c[s] + (1 - gamma[s]) * k;
        
        // Update value estimates
        v[choice[s,t]] = v[choice[s,t]] +  k * pe; 
      }
    }    
  }
}

Are there any issues with my model? And where can I find tutorials or guidance on Stan modeling?

Hi there!

Are there any issues with my model?

One thing that caught my attention is that those Cauchy prior distributions on the group-level standard deviations sigma might be too wide, especially Cauchy(0, 5) for sigma[4]. I suspect that if you were to do a prior predictive check - that is, randomly drawing parameter values from your priors and then using those parameter values to simulate data - you’ll find that the simulated data looks very different from the sort of experimental data you’d typically encounter. So perhaps something much tighter like exponential(1) would be more suitable.

More generally, I’d recommend building up the model’s complexity step by step to better understand where the problem is coming from. Perhaps it’s a good idea to start with a simpler Rescorla-Wagner learning model (if you haven’t already). You could also first implement a non-hierarchical model and fit that to a single participant’s data, before moving on to the hierarchical version you’ve presented above.

Lastly, you might want to consider whether or not you have enough data to fit the (hierarchical) model - that is, is your likelihood sufficiently informative to constrain the parameter values? If you fit a simulated dataset with many more participants and / or trials per participant, do the MCMC diagnostics begin to look better? This paper from Schad et al. (2019) discusses this and many other issues in great detail.

And where can I find tutorials or guidance on Stan modeling?

The resources listed on this new Stan website might be helpful.

Hope that helps!

1 Like

Thank you so much for your advice, Frank!

I have already completed the construction of a basic Rescorla-Wagner (RW) model, and there were no issues such as divergent transitions in this model. Based on your suggestions, I will gradually add more variables to the model from this foundation.

Next, I will also try modifying the prior distributions of my model. Additionally, I didn’t use my entire dataset in this run, which might be another reason for the poor model fitting results. I will make adjustments and report my results again.

Currently, one more issue I’m facing is that each run takes a very long time to complete. Are there any methods to improve the efficiency of my model fitting?

Thank you again for your reply,it has been incredibly helpful to me!

1 Like

I have already completed the construction of a basic Rescorla-Wagner (RW) model, and there were no issues such as divergent transitions in this model.

Okay well that’s encouraging! And although it doesn’t feel like it, remember that Stan’s diagnostics and warnings are a feature, not a bug. When you look at more complex learning and decision-making models from previous papers, keep in mind that in some cases the authors were using simpler / outdated model fitting software that actually did not compute the correct posterior, but didn’t raise any warnings about it.

I will also try modifying the prior distributions of my model.

Yes, to this end it might be easier to explore the priors in base R using something like this:

nSubjects <- 30 # change as appropriate

mu <- rnorm(n = 4, mean = rep(0, times = 4), sd = c(1, 1, 1, 5))
sigma <- rcauchy(n = 4, location = rep(0, times = 4), scale = c(1, 1, 1, 5))

alpha_pr <- rnorm(n = nSubjects, mean = 0, sd = 1)
gamma_pr <- rnorm(n = nSubjects, mean = 0, sd = 1)
c_pr <- rnorm(n = nSubjects, mean = 0, sd = 1)
tau_pr <- rnorm(n = nSubjects, mean = 0, sd = 1)

alpha <- pnorm(mu[1] + sigma[1] * alpha_pr)
gamma <- pnorm(mu[2] + sigma[2] * gamma_pr)
c <- pnorm(mu[3] + sigma[3] * c_pr)
tau <- pnorm(mu[4] + sigma[4] * tau_pr) * 20

So you really see step by step what parameter values are implied by your priors for a single iteration of MCMC sampling.

Currently, one more issue I’m facing is that each run takes a very long time to complete. Are there any methods to improve the efficiency of my model fitting?

Using more informative priors could also help speed up the model fitting. Otherwise I don’t see anything specific in your Stan code that could help, but perhaps others here on the forum have more tips?

Another thing you could consider is to use the pathfinder algorithm to obtain a preliminary model fit, and then use that as the initialisation for a final model fit using MCMC sampling, see e.g. this thread, or Aki Vehtari’s birthdays case study for a usage example. But I believe this is only available in CmdStan, so you’d have to use CmdStanR rather than RStan.

We get that a lot :-).

The usual reason for these kinds of warnings is highly correlated parameters in the posterior. This can come in from multiple intercepts in a hierarchical model. I don’t understand this model well enough to see where that might be.

It’s not going to help effective sample size, but you can speed most of this up by vectorizing. For example,

vector<lower=0, upper=1>[nSubjects] alpha =  Phi_approx(mu_pr  + sigma[1]  * alpha_pr);

It’s easier to understand code when you move the declarations closer to use for variables like v, pe, etc. You can also vectorize the code in that double loop to make things more efficient.