Warnings when using rStan for Linear Ballistic Accumulator Modelling

Hi all,

I am a very new user to stan and rStan and have been trying to model choice using linear ballistic accumulator modelling using rstan. The stan code was written by Annis et al (2016) and seems to behave fine with sample data and real world data. When I use the code with several sets of different data from the lab I receive the following warning -

I have also noticed that the Rhat values are also not ideal nor are the plots of the posterior predictive.

I have also produced a plot of the pairs for pred[1], pred[2], tau, v[1] and v[2]. Not sure how to plot all the parameters as the graph gets very messy with all the parameters included.

My mathematical knowledge of stan and linear ballistic modelling is extremely limited. I have tried adjusting the adapt_delta value from 0.8 up to 0.99 but that had little impact. I have also tried increasing the number of chains (from 3 to 4) and iterations (from 2000 to 3000) to no avail. I have been reading the forums and guides for ideas but most of the suggestions are above my current skillset.

I would greatly appreciate any advice on fixing these issues. I have included the stan code below. Please let me know if any more information is required.

Thanks !

functions{
     
     real lba_pdf(real t, real b, real A, real v, real s){
          //PDF of the LBA model
          
          real b_A_tv_ts;
          real b_tv_ts;
          real term_1;
          real term_2;
          real term_3;
          real term_4;
          real pdf;
          
          b_A_tv_ts <- (b - A - t*v)/(t*s);
          b_tv_ts <- (b - t*v)/(t*s);
          term_1 <- v*Phi(b_A_tv_ts);
          term_2 <- s*exp(normal_log(b_A_tv_ts,0,1)); 
          term_3 <- v*Phi(b_tv_ts);
          term_4 <- s*exp(normal_log(b_tv_ts,0,1)); 
          pdf <- (1/A)*(-term_1 + term_2 + term_3 - term_4);
          
          return pdf;
     }
     
     real lba_cdf(real t, real b, real A, real v, real s){
          //CDF of the LBA model
          
          real b_A_tv;
          real b_tv;
          real ts;
          real term_1;
          real term_2;
          real term_3;
          real term_4;
          real cdf;	
          
          b_A_tv <- b - A - t*v;
          b_tv <- b - t*v;
          ts <- t*s;
          term_1 <- b_A_tv/A * Phi(b_A_tv/ts);	
          term_2 <- b_tv/A   * Phi(b_tv/ts);
          term_3 <- ts/A     * exp(normal_log(b_A_tv/ts,0,1)); 
          term_4 <- ts/A     * exp(normal_log(b_tv/ts,0,1)); 
          cdf <- 1 + term_1 - term_2 + term_3 - term_4;
          
          return cdf;
          
     }
     
     real lba_log(matrix RT, real k, real A, vector v, real s, real tau){
          
          real t;
          real b;
          real cdf;
          real pdf;		
          vector[rows(RT)] prob;
          real out;
          real prob_neg;

          b <- A + k;
          for (i in 1:rows(RT)){	
               t <- RT[i,1] - tau;
               if(t > 0){			
                    cdf <- 1;
                    
                    for(j in 1:num_elements(v)){
                         if(RT[i,2] == j){
                              pdf <- lba_pdf(t, b, A, v[j], s);
                         }else{	
                              cdf <- (1-lba_cdf(t, b, A, v[j], s)) * cdf;
                         }
                    }
                    prob_neg <- 1;
                    for(j in 1:num_elements(v)){
                         prob_neg <- Phi(-v[j]/s) * prob_neg;    
                    }
                    prob[i] <- pdf*cdf;		
                    prob[i] <- prob[i]/(1-prob_neg);	
                    if(prob[i] < 1e-10){
                         prob[i] <- 1e-10;				
                    }
                    
               }else{
                    prob[i] <- 1e-10;			
               }		
          }
          out <- sum(log(prob));
          return out;		
     }
     
    vector lba_rng(real k, real A, vector v, real s, real tau){
          
          int get_pos_drift;	
          int no_pos_drift;
          int get_first_pos;
          vector[num_elements(v)] drift;
          int max_iter;
          int iter;
          real start[num_elements(v)];
          real ttf[num_elements(v)];
          int resp[num_elements(v)];
          real rt;
          vector[2] pred;
          real b;
          
          //try to get a positive drift rate
          get_pos_drift <- 1;
          no_pos_drift <- 0;
          max_iter <- 1000;
          iter <- 0;
          while(get_pos_drift){
               for(j in 1:num_elements(v)){
                    drift[j] <- normal_rng(v[j],s);
                    if(drift[j] > 0){
                         get_pos_drift <- 0;
                    }
               }
               iter <- iter + 1;
               if(iter > max_iter){
                    get_pos_drift <- 0;
                    no_pos_drift <- 1;
               }	
          }
          //if both drift rates are <= 0
          //return an infinite response time
          if(no_pos_drift){
               pred[1] <- -1;
               pred[2] <- -1;
          }else{
               b <- A + k;
               for(i in 1:num_elements(v)){
                    //start time of each accumulator	
                    start[i] <- uniform_rng(0,A);
                    //finish times
                    ttf[i] <- (b-start[i])/drift[i];
               }
               //rt is the fastest accumulator finish time	
               //if one is negative get the positive drift
               resp <- sort_indices_asc(ttf);
               ttf <- sort_asc(ttf);
               get_first_pos <- 1;
               iter <- 1;
               while(get_first_pos){
                    if(ttf[iter] > 0){
                         pred[1] <- ttf[iter] + tau;
                         pred[2] <- resp[iter]; 
                         get_first_pos <- 0;
                    }
                    iter <- iter + 1;
               }
          }
          return pred;	
     }
}

data{
     int LENGTH;
     matrix[LENGTH,2] RT;
     int NUM_CHOICES;
}

parameters {
     real<lower=0> k;
     real<lower=0> A;
     real<lower=0> tau;
     vector<lower=0>[NUM_CHOICES] v;
}

transformed parameters {
     real s;
     s <- 1;
}

model {
     k ~ normal(.5,1)T[0,];
     A ~ normal(.5,1)T[0,];
     tau ~ normal(.5,.5)T[0,];
     for(n in 1:NUM_CHOICES){
          v[n] ~ normal(2,1)T[0,];
     }
     RT ~ lba(k,A,v,s,tau);
}

generated quantities {
     vector[2] pred;
     pred <- lba_rng(k,A,v,s,tau);
}
1 Like

These process models for human reaction time are notoriously fraught with identification traps. You should check for this using SBC (which should show failure if you have a non-identifiability in your model).

Thanks @mike-lawrence. I’ve had a look at the documentation for SBC but i’m still not entirely certain how to use it. I’m trying to use rLBA function which generates an random LBA dataset with 2 columns (response time and decision). I’m getting an error from the generate_datasets function as follows:

ā€œError in stop(SBC_error(ā€œSBC_datasets_errorā€, ā€œThe generating function has to return a list with elements variables\n (that can be converted to draws_rvars) and generatedā€)) :
The generating function has to return a list with elements variables
(that can be converted to draws_rvars) and generatedā€

I’m not too sure how to put the generated dataset into a list given its two unique columns ? Would you have any ideas.

Sorry I know this is probably a bit of a silly question, but I am very novice at R as well !

For now, I have plotted the parameters using mcmc_parcoord and mcmc_scatter. They look strange but I can’t put my finger on how I should be interpreting the parallel coordinates graph ? Thanks for any help, I am completely lost with this one.

Possibly @hyunji.moon could help with the SBC?

1 Like

One thing you could do is to look more closely at the reaction times, because very short RTs make it very hard/impossible to estimate the non decision time parameter.

I would redo the histogram of RTs with a higher resolution, lets say 100 bins, and check of there are any RTs fatser than 150ms. Those I would definitively remove.

Such short RTs could also be modeled with a mixture model, but I would first try to get the model foing with relatively ā€œcleanā€ data.

1 Like

Thanks @Guido_Biele there were definitely some RTs that were faster than 150ms that I removed from the dataset before modelling. It looks like it may have improved the modelling a bit based on the histogram. It looks like the number of divergencies decreased from about ~500 to ~300 as well. However the parameter values i’m getting for pred[1] still have a pretty high standard deviation which is worrying. Is there anything else you think I could do ?

Modelling including <150ms


Modelling cleaned for <150ms


1 Like

The LBA amd similar models were developed to model fast decisions. Modeling RTs of around 3s is already a stretch. Decisions that take 5s or more definitively involve processes tjat have nothing to do with the accumulation process modeled in the LBA and should there for be removed from the data (its better to also model them with a mixture, but I’d save this for later).

It also makes sense to investigate of tje priors used are to wide or to narrow.
You can do this by running your model after commenting out the line that starts with lba(… , generating predicted data in the generated quantities block.
When you plot the simulated RTs, this distribution ahould be broadwr than the distribution of the observed RTs. The aimulated RTs should also not be too broad, i.e. they ahould not have a lot of the probability mass at ā€œimpossibleā€ RTs that are larger than lets say 7s. This is called a posterior predictive check.

3 Likes

Thanks again @Guido_Biele , each suggestion is making the modelling better and better! This is how the histograms look after filtering RT > 5 seconds. Unfortunately, the number of divergencies are remaining quite stable ~300.


Also would you be able to provide more information on the process of checking the width of the priors ? When I comment out the line:

pred ← lba_rng(k,A,v,s,tau);

I am no longer able to plot the simulated RTs on the histogram. Not too sure what else I should be looking for to check this. Regardless, I think the current simulated RTs have a decent distribution ?

Thanks again.

1 Like

Hi,
looks like a tough problem.

As one of the authors of the SBC package I honestly don’t think SBC is likely to be of much help here :-) The problem appears to be with a specific dataset which likely does not inform all the parameters well (as it works for other datsets). SBC is likely to just show the same problem…

Still, I’d be glad if @rstanpsychologystudent ended up resolving the issue. Here are some of my ideas to investigate further.

The mcmc_parcoord appears to not be super helpful here as the parameters are on very different scales (we however see that the divergences are not particularly localized).

Could we get the pairs plot for the same parameters as the parcoord? (and using log transform for all variables as they are lower bounded to zero, to mimic how Stan represents them internally) The k vs log(tau) relationship you’ve shown definitely looks problematic, does it look similar after cleaning the data from the extreme rection times?

It would be also be great if you could tweak the code you seem to already have to generate simulated data and find some specific parameter values that reliably produce similarly problematic results. (that might be hard, so do not spend too much time on it).

Also, just to be sure the Annis paper you refer to is the one at Bayesian inference with Stan: A tutorial on adding custom distributions | SpringerLink ?

Best of luck with your model!

2 Likes

I take your point that SBC can pass yet real data can be too low-information (inc. the model for the data is fundamentally misspecified) to achieve stable MCMC. However I think SBC is an ā€œeasyā€ first step that frankly should be undertaken with every model. And in the case of human RT models, given the history of belayed discovery of nuanced identification issues with them, I have a strong prior that SBC alone will alert that more thought needs to be put into the model structure (and likely where).

So you agree then that SBC would be a quick and definitive first step to determining if the model is misspecified?

Adding a bit to what @martinmodrak said and echoing some of the issues @mike-lawrence is highlighting: I think of an identification issues as (1) theoretical: i.e. different parameter values can lead to the same prediction and (2) empirical: the data is not enough to separate different parameter values even if it would be theoretically possible (e.g. in the extreme case where all observations are the same). Martin is emphasizing (2) and Mike is stressing (1) but they both point to the same problem for the data at hand, there different parameter values that will give the same fit.

One other way to diagnose a problem like that is to simplifying the model so that there are less parameters to estimate. You already did that by setting s <- 1 in transformed parameters. If it is reasonable, you could try to set for instance A to a fixed value or you could use one v. If you have a simple model that works computationally (i.e. no divergent transitions), you can than add complexity and parameters back in to see what ā€œbreaksā€ the model.

One last thing that I noticed is that k is very small and bunching up against 0. It’s almost as if the sample wants k =< 0 and thus b < A. For instance, if k = 0, b = A, than in lba_pdf: b_A_tv_ts = v/s and b_tv_ts = (A/t - v)/s). As a result, only some of the terms provide information on A and tau (through t). I have no idea how to interpret this but it points to an inconsistency between what the model is supposed to be doing (k>0) and what the data tells us.

edit: I had a look at the paper that Martin linked to. k <= 0 seems to imply that the treshold b can be equal to or lower than the initial evidence A. That is for some observations no extra evidence is needed. I don’t know whether that makes sense for the design but it would result in very fast reaction times. A mundane explanation could be that participants in an experiment just always quickly pick the same option independent of the evidence.

As a sidenote. Lognormal or exponential priors on all the strictly positive parameters is a more natural choice to me and a lognormal will put relatively little probability near 0 which might be what you want. I don’t think this will solve the problem though.

2 Likes

Sorry for the slow response, busy times …

I think the easiest way to do a prior predicive check is to

  • generate 5000 (or so) random parameter values from your prior distribution
  • simulate data from these parameter values with the random function here LBA: The Linear Ballistic Accumulator (LBA) in rtdists: Response Time Distributions
  • plot a histogram of the simulated data.
  • see if the simulated data are not radically different than the observed data. The distribution of the simulated data should be wider than the observed data, but impossible RTs (> 15s) should be very rare.
  • if needed, adjust the priors

Be sure to chose the correct distribution underlying the drift rate for the rLBA function. The parameter names are also a bit different.

As an alternative approach you can also comment out the incrementation og the log prob in the Stan model (RT ~ lba(k,A,v,s,tau) in your model) and generate random data with lba_rng in the generated quantities block. This is safer because you don’t need to worry about differences in the R and Stan implementations, but it might be harder to do if one is not familiar with writing Stan models.

Sorry for the very late reply, i’ve just started my PhD this last month and got caught up in work relating to that instead. I’ve made the pairs plot by unfortunately wasn’t able to generate simulated data to reliably produce the problematic results. The pairs plot attached is actually from a separate experiment using the same design. I probably should have noted as well that these response times are recorded from mice who a running up and down a ā€˜runway’ towards a goal. The goal contains reward (sucrose) and punishment (shock) leading to pausing and hence the response time data. The problems with generating the LBA parameters seems to be consistent across these experiments. Also, the annis paper you linked is the one I was referring too !

00003c

I want to tie somethings together in this tread, and I hope it does not sound to repetitive. So, please let me know if I can communicate this better.

  • The log(k) values in the plot are really negative which means that k is estimated to be small. k < e^{-5} \approx 0.0067 which is similar to the first experiment.
  • In a way, the data is very informative on k, because the prior is normal(0.5, 1)T[0,]. I think the data is telling the model that k should be very small or negative. If I understood, the Annis paper correctly, the k parameter is a relative threshold for how much additional evidence is required above A where A is the maximum initial starting point. k = 0 implies that at least sometimes the initial evidence is enough for a mouse to make a decision and there is no accumulation of evidence necessary.
  • That would mean that the model with the restriction that k > 0 is not appropriate for this type of data and maybe that is because of the type of experiment.
  • This is not a problem you would necessarily discover with SBC because the simulated data would all be from a model where k is from the normal(0.5, 1)T[0,] distribution. It would not be conclusive necessarily, but running the model on simulated data with k < 0 could give you some idea whether that breaks the model in the same way. Alternatively, if you have multiple data sources where sometimes the model works and sometimes it doesn’t, you could check what the estimated value of k is in the models that do and do not work.
1 Like

T.Wiecki mentioned in HDDM google group that it seems LBA log-likelihood function sometimes is unstable therefore they deleted LBA from HDDM package. Doing some restrictions to the parameters may help solve the divergent iteration problem. For instance, make sure the summed drift rate is 1 (one action drift rate is v, and the other one would be 1-v), guaranteed non-decision time less than min RT.