Specifying constrains for censored hierarchical shifted Wald model

Dear STAN community,

I am using a STAN code for censored hieararchical shifted Wald model provided by Faulkenberry(Faulkenberry, T. (2023, June). A hierarchical Bayesian shifted Wald model with censoring. Paper presented at Virtual MathPsych/ICCM 2023. Via mathpsych.org/presentation/1283)

I am encountering several issues, mostly related to the estimation of non-decision time (theta; H). Here is the original code:

functions{
    // shifted Wald (log) PDF
    real shiftedWald_lpdf(real X, real gamma, real alpha, real theta){
        real tmp1;
        real tmp2;
        tmp1 = alpha / (sqrt(2 * pi() * (pow((X - theta), 3))));
        tmp2 = exp(-1 * (pow((alpha - gamma * (X-theta)),2)/(2*(X-theta))));
        return log(tmp1*tmp2) ;
    }
    // shifted Wald (log) survivor function
    real shiftedWald_lS(real X, real gamma, real alpha, real theta){
      real tmp1;
      real tmp2;
      real tmp3;
      // compute log of the survival function 1-F for swald
      tmp1 = Phi((gamma*(X-theta) - alpha)/sqrt(X-theta));
      tmp2 = exp(2*alpha*gamma);
      tmp3 = Phi(-1*(gamma*(X-theta) + alpha)/sqrt(X-theta));
      return log(1-tmp1-tmp2*tmp3);
    }
}
data{
    int ns; // number of subjects
    int nrt; // number of trials per subject 
    real rt[ns,nrt]; // matrix of response times
    real D[ns,nrt]; // censoring matrix (0 = correct RT, 1 = incorrect RT)
}
parameters{
    // noncentered parameters
    vector[ns] G_raw;
    vector[ns] A_raw;
    vector[ns] H_raw;
    // define uniform hyperpriors from Matzke & Wagenmakers (2009)
    real<lower=0.85,upper=7.43> g; 
    real<lower=0.67,upper=2.35> a;
    real<lower=0,upper=0.82> h;
    real<lower=0,upper=1.899> gS;
    real<lower=0,upper=0.485> aS;
    real<lower=0,upper=0.237> hS;
 }
transformed parameters{
    // noncentered parameterization (Matt trick) (Betancourt & Girolami, 2003)
    vector[ns] G;
    vector[ns] A;
    vector[ns] H;
    
    for(i in 1:ns){
      G[i] = g + gS*G_raw[i];
      A[i] = a + aS*A_raw[i];
      H[i] = h + hS*H_raw[i];
    }
}
model{
    for(i in 1:ns){
      // group level prior
      G_raw[i] ~ normal(0,1);
      A_raw[i] ~ normal(0,1);
      H_raw[i] ~ normal(0,1);  
    
      for(j in 1:nrt){
        target += (1-D[i,j])*shiftedWald_lpdf(rt[i,j]| G[i], A[i], H[i]) + D[i,j]*shiftedWald_lS(rt[i,j], G[i], A[i], H[i]);
      }
  }
}

I encountered many divergent samples due to proposed non-decision time values exceeding observed reaction time. In reaction to that, I introduced to the data block a vector of minimum reaction times for each participant

data {
         .
         .
         .
   vector  [ns] min_rt;
}

Additionally, to ensure non-decision time is smaller than the minimum reaction time of the given person, I somehow clumsily constrained this parameter in the following way ( which I know is not ideal):

transformed parameters {
         .
         .
    for (i in 1:ns) {
         .
         .     
H[i] = fmin(h + hS * H_raw[i], min_rt[i]-.01); 
    }
}

However, in several instances, the non-decision time (theta; H) was negative, but it has to be strictly positive.

Could you please provide me with a solution on how to ensure positivity of non-decision time, and at the same time, to make it smaller than the minimum reaction time for each participant?
Thank you in advance.
Tom

Welcome to the Stan forum! The issues with the non-decision time you’re describing are quite common for hierarchical estimation of evidence accumulation models.

One approach that has worked well for me is to reparameterise the non-decision time as a proportion of a pre-defined interval, instead of directly estimating it on the scale of seconds.

To elaborate, I might fix the lower bound on the non-decision time to zero for all participants, and determine the upper bound pragmatically on a participant-by-participant basis (say, 0.95 times the fastest observed RT). I’d pass those bounds as data variables into the Stan model, like you’ve done above with min_rt:

data {
  // ...
  real ndt_lower;
  vector[ns] ndt_upper;
  // ...
}

Then, the actual non-decision time is determined using a parameter ndtprop which represents a proportion of that pre-defined range:

parameters {
  // ...
  vector[ns] ndtprop_raw;
  real mu_ndtprop;
  real<lower=0> sigma_ndtprop;
  // ...
}
transformed parameters {
  // ...
  vector[ns] ndtprop;
  vector[ns] H;
  // ...
  for (i in 1:ns) {
    ndtprop[i] = inv_logit(mu_ndtprop + sigma_ndtprop * ndtprop_raw[i]);
    H[i] = ndt_lower + ndtprop[i] * (ndt_upper[i] - ndt_lower);
  }
}

where inv_logit is used to transform the value of ndtprop from the real line \left(-\infty, \infty\right) to the probability scale \left[0, 1\right].

This approach should ensure the non-decision time is strictly positive and smaller than the participant’s minimum observed RT.

If I may, some more general feedback on your code and modelling approach:

First, if you haven’t already, I’d double check your data to ensure there aren’t any implausibly fast response times (say, faster than 200 ms; Ratcliff & Tuerlinckx, 2002) as such RTs would obviously limit the range of values your non-decision time parameter could take.

Second, I noticed you set hard lower and upper bounds on the group-level means and SDs of parameters, and don’t specify any priors on them in the model block (i.e., implicitly uniform priors). This approach is generally discouraged.
Do those specific values represent true constraints? In other words, is it really the case that, for example, g cannot be smaller than 0.85 and cannot be larger than 7.43? If not, I would encourage you to set more lenient bounds in the parameter declarations, and set more informative prior distributions in the model block.

Third, I would say your current implementation of the Wald probability distributions, although technically correct and commonly used, is not really ideal from the standpoint of numerical accuracy/stability. Especially your log survivor function (i.e., log complementary CDF) is prone to numerical problems: the terms tmp1, tmp2, and tmp3 are prone to under-/overflow, and the final line 1-tmp1-tmp2*tmp3 is prone to subtractive cancellation.
If you’re interested, I have a GitHub repo with Stan implementations Wald probability functions, based on the highly accurate implementation from the R package statmod.

Hope that helps!

I’ve also faced this issue in the past, but with shifted-lognormal models. In my case, my solution was to rely on the fact that Stan allows you to specify an upper bound for each parameter within a parameter vector (check here).

Since min_rt is the minimum reaction time per subject, I would do the following:

parameters{
    // noncentered parameters
    vector[ns] G_raw;
    vector[ns] A_raw;
    // centered H
    vector<lower=0, upper=min_rt>[ns] H;
    // define uniform hyperpriors from Matzke & Wagenmakers (2009)
    // ...
 }

Finally, in the model block, you sample the non-decision parameter per person using the centered parameterization.

for(i in 1:ns){
      // group level prior
      G_raw[i] ~ normal(0,1);
      A_raw[i] ~ normal(0,1);
      H[i] ~ normal(h,hS);  
      // ...
}

In this way, it will be naturally constrained between 0 and the minimum observed response time for each person. This also has implications for the mean of the non-decision time. In principle, its maximum (although highly unlikely) possible value would be the highest of all the per-person minimum response times, that is, max(min(RT_i)).

parameters{
    // ...
    real<lower=0,upper=max(min_rt)> h;
    // ...
 }

This worked for me. I hope it can help you too!

1 Like