Across trial drift-rate variability in the (hierarchical) drift diffusion model

pasjor · January 29, 2019, 10:38am

Hello,

I encountered convergence problems when fitting a hierarchical drift diffusion model to reaction time data (Rhat>1, see below). In order to resolve the problem i initially thought that it might help to

supply proper initial values (e.g. values provided by a corresponding MLE-fit)
use a (strong) informative prior on the parameter v_{in} (the parameter causing the problems)
change the adapt and maximum treedepth setting when calling stan().

However, none of which settled the problem. I should note, however, that the problem is solely caused by the introduction of the v_{in} parameter (across trial variability in drift rates). That is, once I fit a corresponding model with v_{in}=0, everything works fine. This seems very puzzling, because the parameter is necessary to model differences in the response times for error and correct trials. As the data show substantial differences in the corresponding response times, i would think that v_{in} should not be poorly identified.

I would be very grateful for any help and suggestions.

data {
  int<lower=1> N;  // total number of observations
  int<lower=1> n; //number of subjects
  vector[N] Y;  // response times
  int<lower=0,upper=1> dec[N];  // decisions zero/one coded
  int<lower=1> k;  // number of distinct conditions (experimental + stimulus)
  int<lower=1> ncue;  // number of distinct experimental conditions
  matrix[n,k] ob;     // upper bounds for nondecision times
  int<lower=1> J_1[N]; // to which subject does the observation belong to? 
  int<lower=1> I_1[N]; // to which condition does the observation belong to? 
  int<lower=1> g[N]; // to which experimental condition does the observation belong to?
}


parameters {
  matrix[n,k] theta;  // drift (mean) parameters; 1 per subject x condition
  vector[N] drifts; // drift rates for each trial
  matrix<lower=0>[n,ncue] a;  // boundary parameter; 1 per subject x experimental condition
  matrix<lower=0, upper=1>[n,k] ndt_raw; // non decision times as a proportion of the given upper bounds
  vector[k] v_mu;   // group level mean drift rate 
  vector<lower=0>[k] v_sig; // variation of drift rates on the group level
  vector<lower=0>[ncue] a_mu; // group level mean boundary separation
  vector<lower=0>[ncue] a_sig; // variation of boundary separation on the group level
  real<lower=0> v_in; // variation of drift rates across trials; causes Rhat>1
}

transformed parameters{
  matrix[n,k] ndt=ndt_raw .* ob;
}


model {
  v_in ~ gamma(4,3);
  
  for (l in 1:n){
    for(j in 1:k){
      ndt_raw[l,j] ~ beta(1, 1);
    }}
  
  for (l in 1:n) {
    for(j in 1:k){
      theta[l,j] ~ normal(v_mu[j],v_sig[j]);
    }}
  
  for (l in 1:n){
    for(j in 1:ncue){
      a[l,j] ~ normal(a_mu[j],a_sig[j])T[0,];
    }}
  
  for (j in 1:k) {
    v_mu[j] ~ normal(0, 1);
    v_sig[j] ~ inv_gamma(4, 3);
  }
  
  for (j in 1:ncue) {
    a_mu[j] ~ inv_gamma(4, 3);
    a_sig[j] ~ inv_gamma(4, 3);
  }
  
  for (j in 1:N) {
    drifts[j] ~ normal(theta[J_1[j],I_1[j]],v_in); // v_in enters the model 
    if(dec[j]==1)
    {Y[j] ~ wiener(a[J_1[j],g[j]],ndt[J_1[j],I_1[j]],0.5,drifts[j]);}
    else{Y[j] ~ wiener(a[J_1[j],g[j]],ndt[J_1[j],I_1[j]],0.5,-drifts[j]);}
  }
}

Output example:

Output:

1: There were 14 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low

$summary
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
v_in 0.5369669 0.02392339 0.07211375 0.2639275 0.5273514 0.5538154 0.5723473 0.6135418 9.086373 1.291439

bbbales2 · January 30, 2019, 3:14am

I’m not sure what’s happening, but you might try non-centering:

drifts[j] ~ normal(theta[J_1[j],I_1[j]],v_in);

Something like:

drifts[j] = theta[J_1[j],I_1[j]] + drifts_z[j] * v_in;
drifts_z[j] ~ normal(0, 1);

So drifts becomes a local variable and drifts_z is a parameter.

It really looks like the sampler is having trouble exploring v_in.

pasjor · January 30, 2019, 10:47pm

Thank you for this suggestion - I will try this modification. (Model takes some time to fit…)

Guido_Biele · January 30, 2019, 11:10pm

Have you thought about doing a prior predictive simulation to explore if drift rates are in a reasonable range?

You can just comment out the section where the wiener lpdf is called and examine the drift parameter values.

pasjor · February 1, 2019, 2:30pm

Your suggestion seems to have solved the problem. Thank you very much!
I must admit though that I don’t understand why this new syntax resolved the problem because both models are statistically the same. Could you perhaps elaborate on this further?

pasjor · February 1, 2019, 2:32pm

Thanks, I will keep this in mind for further prior checks. Didn’t know about this possibility.

bbbales2 · February 1, 2019, 6:24pm

The thing to look at is: https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html. Check out the difference in the centering (what you had originally) vs. non-centering (what you have now). In the centered approach, v_in will super strongly control the curvature in the drifts[j] directions of the posterior. In the second, not so much. The parameterization is different.

You had treedepth problems, not divergences, but that’ll happen if the stepsize adaptation makes the stepsize small enough to avoid divergences.

pasjor · February 5, 2019, 1:23pm

Thanks a lot!

Topic		Replies	Views
Hierarchical learning model fitting issues Modeling fitting-issues , specification	4	605	October 16, 2019
Diffusion drift model (ddm) fitting issues with rstan Modeling fitting-issues , cognitive-science	6	3255	November 26, 2020
Issues with running Bayesian hierarchical reinforcement learning model in Stan Modeling rstan , fitting-issues , specification , performance	12	824	June 28, 2021
Convergence within chains, but not across chains Modeling fitting-issues , stan	3	1416	March 20, 2021
Divergences in multilevel signal detection theory model Modeling fitting-issues , cmdstanr , cognitive-science	3	674	October 8, 2021

Across trial drift-rate variability in the (hierarchical) drift diffusion model

Related topics