Issues with running Bayesian hierarchical reinforcement learning model in Stan

HuaiyuLiu · May 31, 2021, 10:01am

Hi all,

—I am trying to fit a Bayesian hierarchical reinforcement learning model to my behavioral data. But I encountered three model fitting problems: divergent transitions , large R-hat values , and extremely low effective samples size . I failed to solve them after the following attempts and hereby seeking for some help. Let me walk through my study design first.

—( Study design ) Put simply, in my experiment participants ran a reinforcement learning go/no-go task. There are 4 within-subjects conditions (i.e., ‘goToWin’, ‘goToAvoid’, ‘nogoToWin’, ‘nogoToAvoid’), with 60 trials per condition, meaning that 240 trials per subject. Besides, each condition contains only one image, and images differ among these four training conditions. I have 72 subjects in total.

—( Model fitting ) Following is my reinforcement learning model with a hierarchical structure at the subject-level. In total, I have 20 hyper parameters, and note that I already employed both vectorization and non-centered parameterization in my Stan code. I fitted this model using 4 chains , with 2000 warm-up and 5000 iterations per chain.

data {
  // number of subjects
  int<lower=1> N;
  // number of trials (i.e., 240 trials per subject)
  int<lower=1>T;
  // cue per trial for each subject: 1 = goToWin; 2 = goToAvoid; 3 = nogoToWin; 4 = nogoToAvoid
  int<lower=1, upper=4> cue[N, T];
  // action per trial for each subject: 0 = nogo and 1 = go
  int<lower=0, upper=1> action[N, T];
  // action per trial for each subject: -1 = loss, 0 = nothing/neutral, 1 = reward
  real outcome[N, T];
  
}


transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
  
}


parameters {
  // in total we have ten parameters of interest:
  // 1) xi 
  // 2) ep 
  // 3) b 
  // 4) pi 
  // 5) rhoRew 
  // 6) rhoPun 
  // 7) alpha_goToWin 
  // 8) alpha_goToAvoid 
  // 9) alpha_nogoToWin 
  // 10) alpha_nogoToAvoid 
  
  // 'mu_pr' contains ten aforementioned parameters at the population-level.
  vector[10] mu_pr;
  
  // 'sigma' contains ten variances/deviations correspond to the aforementioned ten population-level parameterss
  vector<lower=0>[10] sigma;
  
  vector[N] xi_pr; // each element follows a standard normal distribution
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr;
  vector[N] alpha_goToWin_pr; 
  vector[N] alpha_goToAvoid_pr; 
  vector[N] alpha_nogoToWin_pr; 
  vector[N] alpha_nogoToAvoid_pr;
  
}


transformed parameters{
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  
  vector[N] pi; // unbounded, namely (-Inf, Inf)
  
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  vector<lower=-1, upper=1>[N] alpha_goToWin; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_goToAvoid; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_nogoToWin; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_nogoToAvoid; // bounded between [-1, 1]
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rhoRew = exp(mu_pr[5] + sigma[5] * rhoRew_pr);
  rhoPun = exp(mu_pr[6] + sigma[6] * rhoPun_pr);
  alpha_goToWin = Phi_approx(mu_pr[7] + sigma[7] * alpha_goToWin_pr) * 2 - 1;
  alpha_goToAvoid = Phi_approx(mu_pr[8] + sigma[8] * alpha_goToAvoid_pr) * 2 - 1;
  alpha_nogoToWin = Phi_approx(mu_pr[9] + sigma[9] * alpha_nogoToWin_pr) * 2 - 1;
  alpha_nogoToAvoid = Phi_approx(mu_pr[10] + sigma[10] * alpha_nogoToAvoid_pr) * 2 - 1;
  
}


model {
  // define the priors for those hyper parameters
  mu_pr[1] ~ normal(0, 1.0); // xi
  mu_pr[2] ~ normal(0, 1.0); // ep
  mu_pr[3] ~ normal(0, 10.0); // b
  mu_pr[4] ~ normal(0, 10.0); // pi
  mu_pr[5] ~ normal(0, 1.0); // rhoRew
  mu_pr[6] ~ normal(0, 1.0); // rhoPun
  mu_pr[7] ~ normal(0, 1.0); // alpha_goToWin
  mu_pr[8] ~ normal(0, 1.0); // alpha_goToAvoid
  mu_pr[9] ~ normal(0, 1.0); // alpha_nogoToWin
  mu_pr[10] ~ normal(0, 1.0); // alpha_nogoToAvoid
  
  sigma[1] ~ normal(0, 0.2); // xi
  sigma[2] ~ normal(0, 0.2); // ep
  sigma[3] ~ cauchy(0, 1.0); // b
  sigma[4] ~ cauchy(0, 1.0); // Pavlovian bias
  sigma[5] ~ normal(0, 0.2); // rhoRew
  sigma[6] ~ normal(0, 0.2); // rhoPun
  sigma[7] ~ normal(0, 0.2); // alpha_goToWin
  sigma[8] ~ normal(0, 0.2); // alpha_goToAvoid
  sigma[9] ~ normal(0, 0.2); // alpha_nogoToWin
  sigma[10] ~ normal(0, 0.2); // alpha_nogoToAvoid
  
  // define the priors for those individual parameters (i.e., Matt trick)
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  alpha_goToWin_pr ~ normal(0, 1.0);
  alpha_goToAvoid_pr ~ normal(0, 1.0);
  alpha_nogoToWin_pr ~ normal(0, 1.0);
  alpha_nogoToAvoid_pr ~ normal(0, 1.0);
  
  
  for (s in 1:N) {
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] sv; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV;
    qValue_nogo = initV;
    sv[1] = alpha_goToWin[s];
    sv[2] = alpha_goToAvoid[s];
    sv[3] = alpha_nogoToWin[s];
    sv[4] = alpha_nogoToAvoid[s];
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      

      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      

      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
     // update the stimulus Pavlovian value (per condition)
      if (outcome[s, t] >= 0) {
        sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
      } else {
        sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
      }
      
      // update the action values
      if (action[s, t] == 1) {

        if (outcome[s, t] >= 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        }
        
      } else {

        if (outcome[s, t] >= 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
  
  
}


generated quantities {
  // In this block we will calculate the log-likelihood for each subject

  // re-declare the aforementioned ten population-level parameters
  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  real<lower=-1, upper=1> mu_alpha_goToWin;
  real<lower=-1, upper=1> mu_alpha_goToAvoid;
  real<lower=-1, upper=1> mu_alpha_nogoToWin;
  real<lower=-1, upper=1> mu_alpha_nogoToAvoid;
  
  // declare the log-likelihood
  real log_lik[N];
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rhoRew = exp(mu_pr[5]);
  mu_rhoPun = exp(mu_pr[6]);
  mu_alpha_goToWin = Phi_approx(mu_pr[7]) * 2 - 1;
  mu_alpha_goToAvoid = Phi_approx(mu_pr[8]) * 2 - 1;
  mu_alpha_nogoToWin = Phi_approx(mu_pr[9]) * 2 - 1;
  mu_alpha_nogoToAvoid = Phi_approx(mu_pr[10]) * 2 - 1;
  
  { // local section, this saves time and space
    for (s in 1:N) {
      vector[4] actionWeight_go; 
      vector[4] actionWeight_nogo; 
      vector[4] qValue_go; 
      vector[4] qValue_nogo; 
      vector[4] sv; 
      vector[4] pGo; 
    
      actionWeight_go = initV;
      actionWeight_nogo = initV;
      qValue_go = initV;
      qValue_nogo = initV;
      sv[1] = alpha_goToWin[s];
      sv[2] = alpha_goToAvoid[s];
      sv[3] = alpha_nogoToWin[s];
      sv[4] = alpha_nogoToAvoid[s];
      
      log_lik[s] = 0;
    
      for (t in 1:T) {
        actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
        actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      

        pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      

        {
          pGo[cue[s, t]] *= (1 - xi[s]);
          pGo[cue[s, t]] += (xi[s]/2);
        }
      
        // log-likelihood 
        log_lik[s] += bernoulli_lpmf(action[s, t] | pGo[cue[s, t]]);
      
        // update the stimulus Pavlovian value 
        if (outcome[s, t] >= 0) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else {
          sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      
        // update the action values
        if (action[s, t] == 1) {

          if (outcome[s, t] >= 0) {
            qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
          } else {
            qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
          }
        
        } else {

          if (outcome[s, t] >= 0) {
            qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
          } else {
            qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
          }
        
        
        }
      
      } // end of the loop for the trial-level
    
    } // end of the loop for the subject-level
  
  }
  
}

—( Warning messages ) However, I encountered the following model fitting problems: divergent transitions, large R-hat values, and extremely low effective samples size.

—( R-hat values and n_eff for all hyper parameters )

—( traceplot for all hyper parameters )

—( trankplot for all hyper parameters )

—( ‘pairs()’ to visualize all divergent transitions )

—( Model re-fitting and warning messages again ) Then, I re-fitted the same model using only 1 chain , with 1000 warm-up and 9000 iterations , besides, I also increased the ‘adapt_delta’ parameter to 0.99 . Once again, I got the following fitting problems:

—( R-hat values and n_eff for all hyper parameters )

—( traceplot for all hyper parameters )

—( ‘pairs()’ to visualize all divergent transitions )

—( Model re-re-fitting and warning messages again ) Finally, I re-refitted the same model using 8 chains , with 2,000 warm-up and 12,000 iterations , besides, I also increased the ‘adapt-delta’ parameter to 0.99 . But this time the situation became even worse.

—( traceplot and trank plot for all hyper parameters )

—( ‘pairs()’ to visualize all divergent transitions )

Does anyone see any clues in the aforementioned plots that I have missed? Or any other errors I’ve missed here, or have any thoughts about what might be going on here? I am worried that I am losing my mind and have set up the hierarchy wrong (i.e., perhaps model misspecification?) and just can’t see it. I would like to thank you and thank for any patience in advance.

HuaiyuLiu · May 31, 2021, 11:21am

Can someone help me and give me some suggestions?

andrjohns · May 31, 2021, 1:10pm

This looks like your model might not be identified. Because your model has many moving parts, it’s difficult to debug quickly. The best way to approach this issue is to build your model iteratively, starting with the simplest possible model and increasing the complexity until you see this behaviour, that will show you which part of your model specification is causing your sampling issues.

HuaiyuLiu · June 1, 2021, 9:21am

Dear andrjohns,

Thank you very much for your reply and your valuable suggestion, very appreciated.

Per your suggestion, I built five models iteratively and fitted each model using 4 chains, with 2,000 warm-up and 5,000 iterations per chain. The third model was where these fitting problems occurred. Please let me show them step-by-step.

—(Model1 includes 3 parameters of interest; no warning messages)

data {
  int<lower=1> N; 
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4); 
  
}

parameters {
  // In total 3 parameters of interest: 
  // 1) xi 
  // 2) ep 
  // 3) rho 
  
  vector[3] mu_pr; // declare as vectors for vectorizing
  vector<lower=0>[3] sigma; 
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] rho_pr; 
}

transformed parameters {
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector<lower=0>[N] rho; // [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  rho = exp(mu_pr[3] + sigma[3] * rho_pr);
  
}

model {
  // priors for these 3 hyper-parameters
  mu_pr ~ normal(0, 1.0);
  sigma ~ normal(0, 0.2);
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  rho_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV;
    qValue_nogo = initV;
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]];
      
      // the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
  
      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update action values for either the go or the nogo action
      if (action[s, t] == 1) {
        qValue_go[cue[s, t]] += ep[s] * (rho[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        
      } else {
        qValue_nogo[cue[s, t]] += ep[s] * (rho[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}

generated quantities {
  
  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real<lower=0> mu_rho;
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_rho = exp(mu_pr[3]);
  
}

—(Model 2 has 4 parameters of interest; no warning messages)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4); 
  
}

parameters {
  // in total 4 parameters of interest:
  // 1) xi
  // 2) ep
  // 3) b
  // 4) rho
  
  vector[4] mu_pr; // declare as vectors for vectorizting
  vector<lower=0>[4] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] rho_pr; 
  
}

transformed parameters {
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rho; // bounded as [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  rho = exp(mu_pr[4] + sigma[4] * rho_pr);
  
}


model {
  // priors for these 4 hyper-parameters
  mu_pr[1] ~ normal(0, 1.0); 
  mu_pr[2] ~ normal(0, 1.0); 
  mu_pr[3] ~ normal(0, 10.0); 
  mu_pr[4] ~ normal(0, 1.0); 
  
  sigma[1] ~ normal(0, 0.2); 
  sigma[2] ~ normal(0, 0.2); 
  sigma[3] ~ cauchy(0, 1.0); 
  sigma[4] ~ normal(0, 0.2); 
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  rho_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] pGo; 
    
    actionWeight_go = initV; 
    actionWeight_nogo = initV;
    qValue_go = initV; 
    qValue_nogo = initV;
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // the probability of making a go response
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      
      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update action values
      if (action[s, t] == 1) {
        qValue_go[cue[s, t]] += ep[s] * (rho[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        
      } else {
        qValue_nogo[cue[s, t]] += ep[s] * (rho[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}


generated quantities {
  
  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real<lower=0> mu_rho;
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_rho = exp(mu_pr[4]);
  
}

—(Model 3 has 5 parameters of interest and fitting problems occurred!)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 5 parameters of interest:
  // 1) xi
  // 2) ep
  // 3) b
  // 4) rhoRew
  // 5) rhoPun
  
  vector[5] mu_pr; // declare as vectors for vectorizing
  vector<lower=0>[5] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr; 
  
}

transformed parameters {
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  rhoRew = exp(mu_pr[4] + sigma[4] * rhoRew_pr);
  rhoPun = exp(mu_pr[5] + sigma[5] * rhoPun_pr);
    
}

model {
  // priors for these 5 hyper-parameters
  mu_pr[1] ~ normal(0, 1.0); 
  mu_pr[2] ~ normal(0, 1.0); 
  mu_pr[3] ~ normal(0, 10.0); 
  mu_pr[4] ~ normal(0, 1.0); 
  mu_pr[5] ~ normal(0, 1.0); 
  
  sigma[1] ~ normal(0, 0.2); 
  sigma[2] ~ normal(0, 0.2); 
  sigma[3] ~ cauchy(0, 1.0); 
  sigma[4] ~ normal(0, 0.2); 
  sigma[5] ~ normal(0, 0.2); 
  
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV; 
    qValue_nogo = initV;
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // calculating the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      
      { 
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update action values
      if (action[s, t] == 1) {
        if (outcome[s, t] >= 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        }
        
      } else {
        if (outcome[s, t] >= 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}


generated quantities {

  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_rhoRew = exp(mu_pr[4]);
  mu_rhoPun = exp(mu_pr[5]);
  
}

—(Model 4a has 6 parameters of interest and again more fitting problems!)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 6 parameters of interest:
  // 1) xi 
  // 2) ep 
  // 3) b 
  // 4) pi 
  // 5) rhoRew 
  // 6) rhoPun 
  
  vector[6] mu_pr; // declare as vectors for vectorizing
  vector<lower=0>[6] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr; 
  
}


transformed parameters {
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector[N] pi; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rhoRew = exp(mu_pr[5] + sigma[5] * rhoRew_pr);
  rhoPun = exp(mu_pr[6] + sigma[6] * rhoPun_pr);
  
  
}


model {
  // priors for these 6 hyper-parameters
  mu_pr[1] ~ normal(0, 1.0); // xi
  mu_pr[2] ~ normal(0, 1.0); // ep
  mu_pr[3] ~ normal(0, 10.0); // b
  mu_pr[4] ~ normal(0, 10.0); // pi
  mu_pr[5] ~ normal(0, 1.0); // rhoRew
  mu_pr[6] ~ normal(0, 1.0); // rhoPun
  
  sigma[1] ~ normal(0, 0.2); // xi
  sigma[2] ~ normal(0, 0.2); // ep
  sigma[3] ~ cauchy(0, 1.0); // b
  sigma[4] ~ cauchy(0, 1.0); // pi
  sigma[5] ~ normal(0, 0.2); // rhoRew
  sigma[6] ~ normal(0, 0.2); // rhoPun
  
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] sv; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV; 
    qValue_nogo = initV;
    sv = initV; 
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      
      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update the stimulus value
      if (outcome[s, t] >= 0) {
        sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
      } else {
        sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
      }
      
      // update the action values
      if (action[s, t] == 1) {
        if (outcome[s, t] >= 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        }
        
      } else {
        if (outcome[s, t] >= 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}


generated quantities {

  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rhoRew = exp(mu_pr[5]);
  mu_rhoPun = exp(mu_pr[6]);
  
}

—(Model 4b has 10 parameters of interest and again fitting problems!)
—(Model 4b is the one I posted on my previous question)

data {
  int<lower=1> N;
  int<lower=1>T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];
  
}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 10 parameters of interest:
  // 1) xi 
  // 2) ep 
  // 3) b 
  // 4) pi 
  // 5) rhoRew 
  // 6) rhoPun 
  // 7) alpha_goToWin 
  // 8) alpha_goToAvoid 
  // 9) alpha_nogoToWin 
  // 10) alpha_nogoToAvoid 
  
  vector[10] mu_pr; // decalre as vectors for vectorizing
  vector<lower=0>[10] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr;
  vector[N] alpha_goToWin_pr; 
  vector[N] alpha_goToAvoid_pr; 
  vector[N] alpha_nogoToWin_pr; 
  vector[N] alpha_nogoToAvoid_pr;
  
}


transformed parameters{
  vector<lower=0, upper=1>[N] xi; // bounded between [0, 1]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector[N] pi; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  vector<lower=-1, upper=1>[N] alpha_goToWin; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_goToAvoid; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_nogoToWin; // bounded between [-1, 1]
  vector<lower=-1, upper=1>[N] alpha_nogoToAvoid; // bounded between [-1, 1]
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr);
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rhoRew = exp(mu_pr[5] + sigma[5] * rhoRew_pr);
  rhoPun = exp(mu_pr[6] + sigma[6] * rhoPun_pr);
  alpha_goToWin = Phi_approx(mu_pr[7] + sigma[7] * alpha_goToWin_pr) * 2 - 1;
  alpha_goToAvoid = Phi_approx(mu_pr[8] + sigma[8] * alpha_goToAvoid_pr) * 2 - 1;
  alpha_nogoToWin = Phi_approx(mu_pr[9] + sigma[9] * alpha_nogoToWin_pr) * 2 - 1;
  alpha_nogoToAvoid = Phi_approx(mu_pr[10] + sigma[10] * alpha_nogoToAvoid_pr) * 2 - 1;
  
}


model {
  // priors for these 10 hyper parameters
  mu_pr[1] ~ normal(0, 1.0); 
  mu_pr[2] ~ normal(0, 1.0); 
  mu_pr[3] ~ normal(0, 10.0); 
  mu_pr[4] ~ normal(0, 10.0); 
  mu_pr[5] ~ normal(0, 1.0); 
  mu_pr[6] ~ normal(0, 1.0); 
  mu_pr[7] ~ normal(0, 1.0); 
  mu_pr[8] ~ normal(0, 1.0); 
  mu_pr[9] ~ normal(0, 1.0); 
  mu_pr[10] ~ normal(0, 1.0); 
  
  sigma[1] ~ normal(0, 0.2); 
  sigma[2] ~ normal(0, 0.2); 
  sigma[3] ~ cauchy(0, 1.0); 
  sigma[4] ~ cauchy(0, 1.0); 
  sigma[5] ~ normal(0, 0.2); 
  sigma[6] ~ normal(0, 0.2); 
  sigma[7] ~ normal(0, 0.2); 
  sigma[8] ~ normal(0, 0.2); 
  sigma[9] ~ normal(0, 0.2); 
  sigma[10] ~ normal(0, 0.2); 
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  alpha_goToWin_pr ~ normal(0, 1.0);
  alpha_goToAvoid_pr ~ normal(0, 1.0);
  alpha_nogoToWin_pr ~ normal(0, 1.0);
  alpha_nogoToAvoid_pr ~ normal(0, 1.0);
  
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] sv; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV;
    qValue_nogo = initV;
    sv[1] = alpha_goToWin[s];
    sv[2] = alpha_goToAvoid[s];
    sv[3] = alpha_nogoToWin[s];
    sv[4] = alpha_nogoToAvoid[s];
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      

      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
     // update the stimulus value
      if (outcome[s, t] >= 0) {
        sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
      } else {
        sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
      }
      
      // update the action values
      if (action[s, t] == 1) {
        if (outcome[s, t] >= 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        }
        
      } else {
        if (outcome[s, t] >= 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
}

generated quantities {
 
  real<lower=0, upper=1> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  real<lower=-1, upper=1> mu_alpha_goToWin;
  real<lower=-1, upper=1> mu_alpha_goToAvoid;
  real<lower=-1, upper=1> mu_alpha_nogoToWin;
  real<lower=-1, upper=1> mu_alpha_nogoToAvoid;
  
  mu_xi = Phi_approx(mu_pr[1]);
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rhoRew = exp(mu_pr[5]);
  mu_rhoPun = exp(mu_pr[6]);
  mu_alpha_goToWin = Phi_approx(mu_pr[7]) * 2 - 1;
  mu_alpha_goToAvoid = Phi_approx(mu_pr[8]) * 2 - 1;
  mu_alpha_nogoToWin = Phi_approx(mu_pr[9]) * 2 - 1;
  mu_alpha_nogoToAvoid = Phi_approx(mu_pr[10]) * 2 - 1;
  
}

I have no idea how to solved these fitting problems, could you perhaps help me check where goes wrong? Thank you very much again!

andrjohns · June 1, 2021, 11:06am

I think the issue might be that both rhoPun and and rhoRew are parameters of length N, but the N individuals can only have one (depending on their outcome and action scores). So you’re estimating 2 \times N parameters, but then only using N of them. This means that the unused N parameters are not identified.

HuaiyuLiu · June 1, 2021, 2:23pm

Dear andrjohns,

Thanks for your reply, sorry that I am a bit confused: 1) why the N individuals can only have one instead of two? In the reinforcement learning go/no-go task, each subject experienced 240 trials. And depending on their outcome and action scores in some trials the rhoRew is used, whereas in other trials the rhoPun is used, so theoretically the N individuals can have two I guess?
(2) May I know how to modify the Stan code, perhaps the following way?

vector<lower=0>[2] rho[N]; // Let's say rho[s, 1] refers to the previous **rhoRew**
                                           // rho[s, 2] refers to the previous **rhoPun**

Best

andrjohns · June 3, 2021, 12:59am

Hmm, I see what you mean. Would you be able to share an example dataset that replicates this behaviour? Then I can debug locally a bit faster

HuaiyuLiu · June 3, 2021, 7:49am

Hi Andrew,

Sure, give me sometime to prepare the example dataset that can replicate those ‘problematic’ behaviour, I will share such dataset ASAP.

andrjohns · June 3, 2021, 8:02am

Great! Feel free to PM me directly if you don’t want to share it publicly on the forum as well

HuaiyuLiu · June 7, 2021, 12:38pm

Hi Andrew,
—Sorry for my delayed reply, I was examining the example dataset, hereby the example dataset (40 subjects) and its corresponding template model fitting R script.
modelFitting.template.rmd (2.9 KB)
rl.gng.example.csv (144.9 KB)
—I am going to be brief, based on the above-mentioned Model 1 and Model 2 (also let’s ignore the above-mentioned Model 3/Model 4a/Model 4b), I ran three more extended models ( m3a , m4a , and m4c ) following a stepwise manner. Let me reveal them in the following:

— m3a (five parameters of interest; again encountered model fitting problems)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 5 parameters of interest:
  // 1) xi
  // 2) ep
  // 3) b
  // 4) pi
  // 5) rho
  
  vector[5] mu_pr; // declare as vectors for vectorizing
  vector<lower=0>[5] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rho_pr; 
  
}

transformed parameters {
  vector<lower=0, upper=0.2>[N] xi; // bounded between [0, 0.2]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector[N] pi; // unbounded as (-Inf, Inf)
  vector<lower=0>[N] rho; // bounded as [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr)*0.2;
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rho = log(1 + exp(mu_pr[5] + sigma[5] * rho_pr));
    
}

model {
  // priors for these 5 hyper-parameters
  mu_pr[1] ~ normal(0, 1.0); 
  mu_pr[2] ~ normal(0, 1.0); 
  mu_pr[3] ~ normal(0, 2.0); 
  mu_pr[4] ~ normal(0, 2.0); 
  mu_pr[5] ~ normal(0, 1.0); 
  
  sigma[1] ~ normal(0, 0.2); 
  sigma[2] ~ normal(0, 0.2); 
  sigma[3] ~ cauchy(0, 1.0); 
  sigma[4] ~ cauchy(0, 1.0); 
  sigma[5] ~ normal(0, 0.2); 
  
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rho_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo;
    vector[4] sv;
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV; 
    qValue_nogo = initV;
    sv = initV; 
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // calculating the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      
      { 
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update the stimulus value
      sv[cue[s, t]] += ep[s] * (rho[s] * outcome[s, t] - sv[cue[s, t]]);
      
      // update action values
      if (action[s, t] == 1) {
        qValue_go[cue[s, t]] += ep[s] * (rho[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        
      } else {
        qValue_nogo[cue[s, t]] += ep[s] * (rho[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}


generated quantities {

  real<lower=0, upper=0.2> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rho;
  
  mu_xi = Phi_approx(mu_pr[1])*0.2;
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rho = log(1 + exp(mu_pr[5]));
  
}

log.m3a.R (5.0 KB)

—(After you fitted the m3a, in the following RMarkdown file you can reproduce those problematic diagnostics)
modelChecking.rmd (3.9 KB)

— m4a (six parameters of interest; NO model fitting problem!)

data {
  int<lower=1> N;
  int<lower=1> T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];

}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 6 parameters of interest:
  // 1) xi 
  // 2) ep 
  // 3) b 
  // 4) pi 
  // 5) rhoRew 
  // 6) rhoPun 
  
  vector[6] mu_pr; // declare as vectors for vectorizing
  vector<lower=0>[6] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr; 
  
}


transformed parameters {
  vector<lower=0, upper=0.2>[N] xi; // bounded between [0, 0.2]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector[N] pi; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr)*0.2;
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rhoRew = log(1 + exp(mu_pr[5] + sigma[5] * rhoRew_pr));
  rhoPun = log(1 + exp(mu_pr[6] + sigma[6] * rhoPun_pr));
  
  
}


model {
  // priors for these 6 hyper-parameters
  mu_pr[1] ~ normal(0, 1.0); // xi
  mu_pr[2] ~ normal(0, 1.0); // ep
  mu_pr[3] ~ normal(0, 2.0); // b
  mu_pr[4] ~ normal(0, 2.0); // pi
  mu_pr[5] ~ normal(0, 1.0); // rhoRew
  mu_pr[6] ~ normal(0, 1.0); // rhoPun
  
  sigma[1] ~ normal(0, 0.2); // xi
  sigma[2] ~ normal(0, 0.2); // ep
  sigma[3] ~ cauchy(0, 1.0); // b
  sigma[4] ~ cauchy(0, 1.0); // pi
  sigma[5] ~ normal(0, 0.2); // rhoRew
  sigma[6] ~ normal(0, 0.2); // rhoPun
  
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] sv; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV; 
    qValue_nogo = initV;
    sv = initV; 
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      
      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
      // update the stimulus value
      if (cue[s, t] ==1) { // goToWin (0 or 1)
        if (outcome[s, t] == 1) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] ==0) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      } else if (cue[s, t] == 2) { // goToAvoid (-1 or 0)
        if (outcome[s, t] == 0) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] == -1) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      } else if (cue[s, t] ==3) { // nogoToWin (0 or 1)
        if (outcome[s, t] == 1) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] ==0) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        } 
      } else if (cue[s, t] == 4) { // nogoToAvoid (-1 or 0)
        if (outcome[s, t] == 0) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] == -1) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      }
      
      // update the action values
      if (cue[s, t] == 1) { // goToWin (0 or 1)
        if (action[s, t] == 1 && outcome[s, t] == 1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
      } else if (cue[s, t] == 2) { // goToAvoid (-1 or 0)
        if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == -1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == -1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
  
      } else if (cue[s, t] == 3) { // nogoToWin (0 or 1)
        if (action[s, t] == 1 && outcome[s, t] == 1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
      } else if (cue[s, t] == 4) { // nogoToAvoid (-1 or 0)
        if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == -1) {
           qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == -1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
      }
      
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
  
}


generated quantities {

  real<lower=0, upper=0.2> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  
  mu_xi = Phi_approx(mu_pr[1])*0.2;
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rhoRew = log(1 + exp(mu_pr[5]));
  mu_rhoPun = log(1 + exp(mu_pr[6]));
  
}

log.m4a.R (4.4 KB)
— m4c (ten parameters of interest; NO model fitting problem!)

data {
  int<lower=1> N;
  int<lower=1>T;
  int<lower=1, upper=4> cue[N, T];
  int<lower=0, upper=1> action[N, T];
  real outcome[N, T];
  
}

transformed data {
  vector[4] initV; 
  initV = rep_vector(0.0, 4);
  
}

parameters {
  // in total 10 parameters of interest:
  // 1) xi 
  // 2) ep 
  // 3) b 
  // 4) pi 
  // 5) rhoRew 
  // 6) rhoPun 
  // 7) alpha_goToWin 
  // 8) alpha_goToAvoid 
  // 9) alpha_nogoToWin 
  // 10) alpha_nogoToAvoid 
  
  vector[10] mu_pr; // decalre as vectors for vectorizing
  vector<lower=0>[6] sigma;
  
  vector[N] xi_pr; // each element ~ N(0, 1)
  vector[N] ep_pr; 
  vector[N] b_pr; 
  vector[N] pi_pr; 
  vector[N] rhoRew_pr; 
  vector[N] rhoPun_pr;
  
}


transformed parameters{
  vector<lower=0, upper=0.2>[N] xi; // bounded between [0, 0.2]
  vector<lower=0, upper=1>[N] ep; // bounded between [0, 1]
  vector[N] b; // unbounded, namely (-Inf, Inf)
  vector[N] pi; // unbounded, namely (-Inf, Inf)
  vector<lower=0>[N] rhoRew; // bounded as [0, Inf)
  vector<lower=0>[N] rhoPun; // bounded as [0, Inf)
  real<lower=-1, upper=1> alpha_goToWin; // bounded between [-1, 1]
  real<lower=-1, upper=1> alpha_goToAvoid; // bounded between [-1, 1]
  real<lower=-1, upper=1> alpha_nogoToWin; // bounded between [-1, 1]
  real<lower=-1, upper=1> alpha_nogoToAvoid; // bounded between [-1, 1]
  
  // vectorization and non-centered parameterization
  xi = Phi_approx(mu_pr[1] + sigma[1] * xi_pr)*0.2;
  ep = Phi_approx(mu_pr[2] + sigma[2] * ep_pr);
  b = mu_pr[3] + sigma[3] * b_pr;
  pi = mu_pr[4] + sigma[4] * pi_pr;
  rhoRew = log(1 + exp(mu_pr[5] + sigma[5] * rhoRew_pr));
  rhoPun = log(1 + exp(mu_pr[6] + sigma[6] * rhoPun_pr));
  alpha_goToWin = Phi_approx(mu_pr[7]) * 2 - 1;
  alpha_goToAvoid = Phi_approx(mu_pr[8]) * 2 - 1;
  alpha_nogoToWin = Phi_approx(mu_pr[9]) * 2 - 1;
  alpha_nogoToAvoid = Phi_approx(mu_pr[10]) * 2 - 1;
  
}


model {
  // priors for these 10 hyper parameters
  mu_pr[1] ~ normal(0, 1.0); 
  mu_pr[2] ~ normal(0, 1.0); 
  mu_pr[3] ~ normal(0, 2.0); 
  mu_pr[4] ~ normal(0, 2.0); 
  mu_pr[5] ~ normal(0, 1.0); 
  mu_pr[6] ~ normal(0, 1.0); 
  mu_pr[7] ~ normal(0, 1.0); 
  mu_pr[8] ~ normal(0, 1.0); 
  mu_pr[9] ~ normal(0, 1.0); 
  mu_pr[10] ~ normal(0, 1.0); 
  
  sigma[1] ~ normal(0, 0.2); 
  sigma[2] ~ normal(0, 0.2); 
  sigma[3] ~ cauchy(0, 1.0); 
  sigma[4] ~ cauchy(0, 1.0); 
  sigma[5] ~ normal(0, 0.2); 
  sigma[6] ~ normal(0, 0.2); 
  
  
  // Matt trick
  xi_pr ~ normal(0, 1.0);
  ep_pr ~ normal(0, 1.0);
  b_pr ~ normal(0, 1.0);
  pi_pr ~ normal(0, 1.0);
  rhoRew_pr ~ normal(0, 1.0);
  rhoPun_pr ~ normal(0, 1.0);
  
  
  for (s in 1:N) {
    // 1)In total I have 4 training conditions.
    // 2)Under each condition subjects have two alternative action choices: go and no-go.
    // 3)So, for each action per condition, I want to now its action weight and q-value.
    vector[4] actionWeight_go; 
    vector[4] actionWeight_nogo; 
    vector[4] qValue_go; 
    vector[4] qValue_nogo; 
    vector[4] sv; 
    vector[4] pGo; 
    
    actionWeight_go = initV;
    actionWeight_nogo = initV;
    qValue_go = initV;
    qValue_nogo = initV;
    sv[1] = alpha_goToWin;
    sv[2] = alpha_goToAvoid;
    sv[3] = alpha_nogoToWin;
    sv[4] = alpha_nogoToAvoid;
    
    for (t in 1:T) {
      actionWeight_go[cue[s, t]] = qValue_go[cue[s, t]] + b[s] + pi[s] * sv[cue[s, t]]; 
      actionWeight_nogo[cue[s, t]] = qValue_nogo[cue[s, t]]; 
      
      // the probability of making go responses
      pGo[cue[s,t]] = inv_logit(actionWeight_go[cue[s, t]] - actionWeight_nogo[cue[s,t]]);
      

      {
        pGo[cue[s, t]] *= (1 - xi[s]);
        pGo[cue[s, t]] += (xi[s]/2);
      }
      
      // Likelihood function
      action[s, t] ~ bernoulli(pGo[cue[s, t]]);
      
     // update the stimulus value
      if (cue[s, t] ==1) { // goToWin (0 or 1)
        if (outcome[s, t] == 1) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] ==0) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      } else if (cue[s, t] == 2) { // goToAvoid (-1 or 0)
        if (outcome[s, t] == 0) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] == -1) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      } else if (cue[s, t] ==3) { // nogoToWin (0 or 1)
        if (outcome[s, t] == 1) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] ==0) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        } 
      } else if (cue[s, t] == 4) { // nogoToAvoid (-1 or 0)
        if (outcome[s, t] == 0) {
          sv[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - sv[cue[s, t]]);
        } else if (outcome[s, t] == -1) {
           sv[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - sv[cue[s, t]]);
        }
      }
      
      // update the action values
      if (cue[s, t] == 1) { // goToWin (0 or 1)
        if (action[s, t] == 1 && outcome[s, t] == 1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
      } else if (cue[s, t] == 2) { // goToAvoid (-1 or 0)
        if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == -1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == -1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
  
      } else if (cue[s, t] == 3) { // nogoToWin (0 or 1)
        if (action[s, t] == 1 && outcome[s, t] == 1) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
        
      } else if (cue[s, t] == 4) { // nogoToAvoid (-1 or 0)
        if (action[s, t] == 1 && outcome[s, t] == 0) {
          qValue_go[cue[s, t]] += ep[s] * (rhoRew[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 1 && outcome[s, t] == -1) {
           qValue_go[cue[s, t]] += ep[s] * (rhoPun[s] * outcome[s, t] - qValue_go[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == 0) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoRew[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        } else if (action[s, t] == 0 && outcome[s, t] == -1) {
          qValue_nogo[cue[s, t]] += ep[s] * (rhoPun[s] *outcome[s, t] - qValue_nogo[cue[s, t]]);
        }
      }
      
    } // end of the loop for the trial-level
    
  } // end of the loop for the subject-level
}

generated quantities {
 
  real<lower=0, upper=0.2> mu_xi;
  real<lower=0, upper=1> mu_ep;
  real mu_b;
  real mu_pi;
  real<lower=0> mu_rhoRew;
  real<lower=0> mu_rhoPun;
  real<lower=-1, upper=1> mu_alpha_goToWin;
  real<lower=-1, upper=1> mu_alpha_goToAvoid;
  real<lower=-1, upper=1> mu_alpha_nogoToWin;
  real<lower=-1, upper=1> mu_alpha_nogoToAvoid;
  
  mu_xi = Phi_approx(mu_pr[1])*0.2;
  mu_ep = Phi_approx(mu_pr[2]);
  mu_b = mu_pr[3];
  mu_pi = mu_pr[4];
  mu_rhoRew = log(1 + exp(mu_pr[5]));
  mu_rhoPun = log(1 + exp(mu_pr[6]));
  mu_alpha_goToWin = alpha_goToWin;
  mu_alpha_goToAvoid = alpha_goToAvoid;
  mu_alpha_nogoToWin = alpha_nogoToWin;
  mu_alpha_nogoToAvoid = alpha_nogoToAvoid;
  
}

log.m4c.R (4.4 KB)
—Unexpectedly and surprisingly, it was the m3a that revealed model fitting problems, whereas the more complicated ones (i.e., m4a and m4c) fitted well. Could you take a look? Thank you very much in advance! And let me know if you need more details (e.g., model specification, study design, etc.)

Best

andrjohns · June 9, 2021, 12:01am

No worries at all, I’ll let you know what I find

HuaiyuLiu · June 9, 2021, 12:06pm

Great! Looking forward.

HuaiyuLiu · June 28, 2021, 8:58am

Hi Andrew,

Sorry to interrupt you. May I know some updates?

Best,

Kim

Topic		Replies	Views
Extend hierarchical reinforcement learning model for multiple conditions Modeling hierarchical-model , reparametrization	8	1679	April 21, 2021
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8415	June 6, 2017
Repeated measure hierarchical reinforcement learning with groups (2 x 2 x 2 design) Modeling rstan , fitting-issues , hierarchical-model , model-comparison , cognitive-science	7	2065	July 28, 2020
Request for help understanding model behavior Modeling	17	1569	March 24, 2018
Help with understanding my results Modeling fitting-issues , specification , performance	5	1217	February 8, 2019

Issues with running Bayesian hierarchical reinforcement learning model in Stan

Related topics