Across trial drift-rate variability in the (hierarchical) drift diffusion model

Hello,

I encountered convergence problems when fitting a hierarchical drift diffusion model to reaction time data (Rhat>1, see below). In order to resolve the problem i initially thought that it might help to

  1. supply proper initial values (e.g. values provided by a corresponding MLE-fit)

  2. use a (strong) informative prior on the parameter v_{in} (the parameter causing the problems)

  3. change the adapt and maximum treedepth setting when calling stan().

    However, none of which settled the problem. I should note, however, that the problem is solely caused by the introduction of the v_{in} parameter (across trial variability in drift rates). That is, once I fit a corresponding model with v_{in}=0, everything works fine. This seems very puzzling, because the parameter is necessary to model differences in the response times for error and correct trials. As the data show substantial differences in the corresponding response times, i would think that v_{in} should not be poorly identified.

I would be very grateful for any help and suggestions.

data {
  int<lower=1> N;  // total number of observations
  int<lower=1> n; //number of subjects
  vector[N] Y;  // response times
  int<lower=0,upper=1> dec[N];  // decisions zero/one coded
  int<lower=1> k;  // number of distinct conditions (experimental + stimulus)
  int<lower=1> ncue;  // number of distinct experimental conditions
  matrix[n,k] ob;     // upper bounds for nondecision times
  int<lower=1> J_1[N]; // to which subject does the observation belong to? 
  int<lower=1> I_1[N]; // to which condition does the observation belong to? 
  int<lower=1> g[N]; // to which experimental condition does the observation belong to?
}


parameters {
  matrix[n,k] theta;  // drift (mean) parameters; 1 per subject x condition
  vector[N] drifts; // drift rates for each trial
  matrix<lower=0>[n,ncue] a;  // boundary parameter; 1 per subject x experimental condition
  matrix<lower=0, upper=1>[n,k] ndt_raw; // non decision times as a proportion of the given upper bounds
  vector[k] v_mu;   // group level mean drift rate 
  vector<lower=0>[k] v_sig; // variation of drift rates on the group level
  vector<lower=0>[ncue] a_mu; // group level mean boundary separation
  vector<lower=0>[ncue] a_sig; // variation of boundary separation on the group level
  real<lower=0> v_in; // variation of drift rates across trials; causes Rhat>1
}

transformed parameters{
  matrix[n,k] ndt=ndt_raw .* ob;
}


model {
  v_in ~ gamma(4,3);
  
  for (l in 1:n){
    for(j in 1:k){
      ndt_raw[l,j] ~ beta(1, 1);
    }}
  
  for (l in 1:n) {
    for(j in 1:k){
      theta[l,j] ~ normal(v_mu[j],v_sig[j]);
    }}
  
  for (l in 1:n){
    for(j in 1:ncue){
      a[l,j] ~ normal(a_mu[j],a_sig[j])T[0,];
    }}
  
  for (j in 1:k) {
    v_mu[j] ~ normal(0, 1);
    v_sig[j] ~ inv_gamma(4, 3);
  }
  
  for (j in 1:ncue) {
    a_mu[j] ~ inv_gamma(4, 3);
    a_sig[j] ~ inv_gamma(4, 3);
  }
  
  for (j in 1:N) {
    drifts[j] ~ normal(theta[J_1[j],I_1[j]],v_in); // v_in enters the model 
    if(dec[j]==1)
    {Y[j] ~ wiener(a[J_1[j],g[j]],ndt[J_1[j],I_1[j]],0.5,drifts[j]);}
    else{Y[j] ~ wiener(a[J_1[j],g[j]],ndt[J_1[j],I_1[j]],0.5,-drifts[j]);}
  }
} 

Output example:

Output:

1: There were 14 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low

$summary
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
v_in 0.5369669 0.02392339 0.07211375 0.2639275 0.5273514 0.5538154 0.5723473 0.6135418 9.086373 1.291439

I’m not sure what’s happening, but you might try non-centering:

drifts[j] ~ normal(theta[J_1[j],I_1[j]],v_in);

Something like:

drifts[j] = theta[J_1[j],I_1[j]] + drifts_z[j] * v_in;
drifts_z[j] ~ normal(0, 1);

So drifts becomes a local variable and drifts_z is a parameter.

It really looks like the sampler is having trouble exploring v_in.

Thank you for this suggestion - I will try this modification. (Model takes some time to fit…)

Have you thought about doing a prior predictive simulation to explore if drift rates are in a reasonable range?

You can just comment out the section where the wiener lpdf is called and examine the drift parameter values.

Your suggestion seems to have solved the problem. Thank you very much!
I must admit though that I don’t understand why this new syntax resolved the problem because both models are statistically the same. Could you perhaps elaborate on this further?

Thanks, I will keep this in mind for further prior checks. Didn’t know about this possibility.

The thing to look at is: https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html. Check out the difference in the centering (what you had originally) vs. non-centering (what you have now). In the centered approach, v_in will super strongly control the curvature in the drifts[j] directions of the posterior. In the second, not so much. The parameterization is different.

You had treedepth problems, not divergences, but that’ll happen if the stepsize adaptation makes the stepsize small enough to avoid divergences.

Thanks a lot!