Custom likelihood function based on Wiener first passage time density accounting for NoGo responses

Dear Stan community,

I’m trying to fit a multi-level drift diffusion model (DDM) based on the Wiener first passage time distribution (here in the stan documentation; here in the stan math library).

In my task, the correct response for participants to give is sometimes a NoGo response, which yields no reaction time (RT). The way to deal with this is to integrate over RTs for the NoGo boundary, for which an approximation is given in Blurton, Kesselmeier, & Gondan (2012, Journal of Mathematical Psychology), see especially the appendix for R and Matlab implementations. I use a a simplification that has been suggested on the HDDM mailing list (HDDM is a Python wrapper package that fits DDMs in PyMC). This simplification is for example implemented here in the prob_ub function which is then called here in the wiener_like_multi function. Because I treat NoGo as the lower boundary, I compute 1 minus the probability of reaching the upper boundary.

I’ve tried this implementation myself on a simple data set (only 1 participant) simulated from the RWiener package and then fitted using the maximum likelihood estimation via optim(). Given the loss of information due to missing RTs, it works reasonably well imo.

I’ve now tried to implement this code in stan as well as a custom-likelihood function. I get reasonable parameter estimates, but no log-likelihoods. Also, I see reasons why it my blow up for larger data sets and more complicated model versions. My questions are

  1. Is this definition formally correct (or am I somehow misinterpreting the way custom functions in stan should be defined)?
  2. While I get sensible parameters (compared to ground truth and to parameters estimated with MLE), it somehow does not compute log-likelihood (log_lik), but I get NaNs there—even though I use the exact same command as for incrementing the target distribution… What am I do wrong here?
  3. This is a simple toy example with little data, but I’m afraid that it might blow up on bigger data sets (say 35 people, 250 data points per subject, so 8750 data points in total). I’m particularly concerned
    a. whether my way of just taking the log at the end is the most efficient and suitable way;
    b. what happens if for NoGo responses, one of the parameters is 0, so I get log(0) for the log-likelihood. I now just added machine_precision(); is there a better way to do this?

See my implementation below:


library(RWiener)
library(rstan)

## Define data settings:
nTrial <- 1000
par <- c(1, 0.2, 0.75, 0.5) # parameters alpha, tau, beta, delta

## Simulate data:
set.seed(0)
dat <- rwiener(n = nTrial, alpha = par[1], tau = par[2], beta = par[3], delta = par[4]) # simulate data

## Define custom deviance function to allow for NoGo responses:
wiener_deviance_gonogo <- function(x,data){
  #' Return deviance for Go/NoGo data. NoGo at lower boundary.
  #' x: vector with the four parameter values: alpha, tau, beta, delta
  #' data: dataframe with data. Needs a reaction time column \param{q} with RTs in seconds
  #'  and a accuracy/response column \param{resp} with values "upper" or "lower".

  nData <- nrow(data)
  devSum <- 0
  
  ## Loop through data:
  for (iData in 1:nData){
    dat <- data[iData,] # select trial
    if (dat$resp=="upper"){ # if Go:
      myDeviance <- wiener_deviance(x,dat)
    } else { # if NoGo:
      alpha <- x[1]; beta <- x[3]; delta <- x[4] # extract parameters
      lik <- (1 - (exp(-2*alpha*beta*delta) - 1) / (exp(-2*alpha*delta) - 1))
      myDeviance <- -2*log(lik) # convert to deviance
    }
    devSum <- devSum + myDeviance # sum up deviance
  }
  return(devSum)
}

## Fit with MLE via optim():
fitMLE <- optim(c(1, .1, .1, 1), wiener_deviance, dat=dat, method="Nelder-Mead") # fit based on all data
# fitMLE$par
# [1] 0.9803655 0.1984600 0.7304328 0.5774809
fitMLE <- optim(c(1, .1, .1, 1), wiener_deviance_gonogo, dat=dat, method="Nelder-Mead") # fit only on upper bound (Go) RTs
fitMLE$par
# [1] 0.9444393 0.1980153 0.7139110 0.6735721
# Reasonable fit given that ground truth par was
# 1, 0.2, 0.75, 0.5

## Prepare data for stan:
standata <- list(y = dat$q,
                 resp = dat$resp,
                 N = length(dat$resp),
                 RTbound = 0.1,
                 minRT = min(dat$q))
if (is.factor(standata$resp)){standata$resp <- ifelse(standata$resp == "upper", 1, 0)}
standata$y <- ifelse(standata$resp == 1, standata$y, 0) # set RTs on NoGo trials to 0 # Set RTs for NoGo to 0

And now in stan:

wiener_model_gonogo <- "
functions {
  /* returns Wiener First Passage Time Distribution if resp == 1 (Go), i.e., when RT is available
   * otherwise integrates over RTs (NoGo) based on solution in Blurton, Kesselmeier, & Gondan (2012, Journal of Mathematical Psychology) 
   * see also discussion under https://groups.google.com/g/hddm-users/c/3hlU6RnNcrw ,
   * and implementation in function prob_ub in HDDM: https://github.com/hddm-devs/hddm/blob/master/src/pdf.pxi ,
   * which is called in function wiener_like_multi in HDDM: https://github.com/hddm-devs/hddm/blob/master/src/wfpt.pyx .
   * Args:
   *   y    : vector or reaction times (in seconds)
   *   resp : vector or responses, either 1 (Go) or 0 (NoGo)
   *   alpha: Boundary separation
   *   tau  : Nondecision time
   *   beta : A-priori bias
   *   delta: Drift rate
   * Returns:
   *   log probability density for RT/resp given parameters
   */
  real wiener_gonogo_lpdf(real y, real resp, real alpha, real tau, real beta, real delta) { 
    if (resp == 1) { // Go: just use wiener_lpdf
      return wiener_lpdf(y | alpha, tau, beta, delta);
    } else if (delta == 0) { // NoGo and drift == 0
      return log(1 - beta + machine_precision()); // just return starting point bias (1 minus because lower boundary)
    } else { // NoGo and drift != 0
      return log(1 - (exp(-2 * alpha * beta * delta) - 1) / (exp(-2 * alpha * delta) - 1) + machine_precision()); // analytical solution for integration (1-p because lower boundary)
    }
  }
}
data {
  int<lower=0> N; // number of data points
  real resp[N];   // choices (1=upper bound, 0=lower bound)
  real y[N];      // RTs (in sec.)
  real minRT;     // minimum RT of the observed data
  real RTbound;   // lower bound or RT (e.g., 0.1 second)
}
parameters {
  real<lower=0, upper=5> alpha;  // boundary separation
  real<lower=RTbound, upper=minRT> tau;  // nondecision time
  real<lower=0, upper=1> beta;   // initial bias
  real delta;  // drift rate
}
model {
  alpha ~ uniform(0, 5);
  tau ~ uniform(RTbound, minRT);
  beta  ~ uniform(0, 1);
  delta ~ normal(0, 2);

  for (n in 1:N) {
    target += wiener_gonogo_lpdf(y[n] | resp[n], alpha, tau, beta, delta);
  }
}
generated quantities {
  // For log likelihood calculation
  real log_lik;

  for (n in 1:N) {
    log_lik += wiener_gonogo_lpdf(y[n] | resp[n], alpha, tau, beta, delta);
  }
}
"

And now fitting:

fit <- stan(data = standata, model_code = wiener_model_gonogo, chains = 4, iter = 4000, warmup = 1000) 
# takes about 15 min. to fit

# Gives the following warning:
# : In validityMethod(object) :
#   The following variables have undefined values:  log_lik. Many subsequent functions will not work correctly.

print(fit)
Inference for Stan model: 3eba659d8f1a432623da6b89532cbcfb.
4 chains, each with iter=4000; warmup=1000; thin=1; 
post-warmup draws per chain=3000, total post-warmup draws=12000.

          mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
alpha     0.95    0.00 0.03   0.89   0.93   0.94   0.96   1.00  4644    1
tau       0.20    0.00 0.00   0.20   0.20   0.20   0.20   0.20  5234    1
beta      0.71    0.00 0.02   0.68   0.70   0.71   0.72   0.74  4096    1
delta     0.68    0.00 0.12   0.46   0.60   0.68   0.76   0.91  4820    1
log_lik    NaN      NA   NA     NA     NA     NA     NA     NA   NaN  NaN
lp__    396.35    0.02 1.39 392.88 395.67 396.67 397.37 398.10  4632    1

Samples were drawn using NUTS(diag_e) at Fri Apr 16 00:54:02 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

# Parameter values look the same as with MLE; but why is log_lik NaN?!?

Thanks in advance for any help!

1 Like

Hi,
I think the reason might be rather silly :) (But I might be wrong)
Can you try to do this?

  real log_lik = 0;

I think log_lik is initialized with NaN, and you’re summing numbers to NaN.

3 Likes

:facepalm: That was indeed it! Thanks @bnicenboim! At least that problem is solved!

2 Likes

I guess you could replace

log(1 - beta + machine_precision()); 

with log1m(beta)
It should be more stable

For this

      return log(1 - (exp(-2 * alpha * beta * delta) - 1) / (exp(-2 * alpha * delta) - 1) + machine_precision()); 

you could try using log1m_exp, or at least log1m

AFAIK, If you get a probability of 0, then this event should never occur. If it occurs, then there must be something wrong with the estimates, and then the sampler just rejects the proposal for the estimates. (Please someone correct me if I’m wrong).
So there is no point in adding machine_precision(), if this should happen, it should have some probability associated with it. If it’s unreasonable, then it’s fine that the sampler proposal would be rejected.

1 Like

I have temporarily marked the question as resolved. Please let me know if that is OK.

Thanks @bnicenboim !
Yes indeed, log(0) should practically never occur—I was following Blurton, Kesselmeier, & Gondan (2012 and they explicitly account for such cases–I guess just for mathematical completeness, not that it practically would ever happen…

I don’t see an easy way to use log1m_exp here (because of the remaining -1 in both denominator and numerator), but yes log1m seems to work & speed up things a bit already, so thanks!

2 Likes

@maxbiostat thanks yes I consider the question as resolved now.

2 Likes