Proper setup of priors for a drift difusion model

First, this paper by Tran et al. (2021) is a useful resource for setting priors on DDM parameters.

Regarding your priors on the effects of x1 and x2 on the drift rate: The output of get_prior shows you that for the class “b” (i.e., regression coefficients on the drift rate), you can specify priors on the coefficients x2level2, x2level3, and x1. Since you used treatment coding for x2, the coefficient x2level1 is implicitly taken as the reference value, so that is actually the Intercept term. I don’t know whether it makes sense for you to treat level1 as the reference level; you could also try other coding schemes such as a sum-to-zero contrast.

You set priors on x2level1, x2level2, and x1. But based on the above info, you should actually set priors on x2level2, x2level3, and x1. To be honest I’m not sure what would happen to your specified prior for x2level1, I suspect it would just be ignored.

Your priors on these effects are quite strongly negative and flat (mean = -5, SD = 5). I would have typically expected smaller effects on the drift rate. Assuming your RT data is on the scale of seconds, the drift rate represents the mean amount of “evidence” accumulated (in arbitrary units) per second. Anecdotally, I generally find the drift rate is anywhere from 0.5 to 6 or so (depending widely on the study at hand of course). So I would expect any effects on the drift rate to be anywhere in the -2 to 2 range, just as a ballpark.

Next, regarding the warning message: This occurs because you used the "identity" linking function for each of the boundary separation, non-decision time, and starting point parameters. This just means that you are directly estimating the values of these parameters by sampling from the real line \left(-\infty, \infty\right), without any bounds or transformations applied before plugging these parameter values into the Wiener first passage time distribution function.

The problem is that the default priors on the intercepts of the starting point, boundary separation, and non-decision time are all lower-bounded distributions (and in the case of the starting point and non-decision time, also upper-bounded) — suggesting that the parameters themselves should also be bounded accordingly — but as I explained above, due to the use of the "identity" link, these parameters are in fact not bounded accordingly.

If you look at the relevant brms source code, you’ll see that the default settings are as follows:

wiener <- function(link = "identity", link_bs = "log",
                   link_ndt = "log", link_bias = "logit") {
  slink <- substitute(link)
  .brmsfamily("wiener", link = link, slink = slink,
              link_bs = link_bs, link_ndt = link_ndt,
              link_bias = link_bias)
}

Since you are not setting any specific priors on these parameters yourself, I would follow the default settings (so no need to bother tweaking link_bs, link_ndt, and link_bias in your call to wiener).

To explain further how these linking functions work, note that all parameters are sampled on the unconstrained real line \left(-\infty, \infty\right). For parameters with the ”log” and “logit” linking functions, these unconstrained parameter values are treated as “transformed” values that still need to be “back-transformed” to an appropriate space before they can be passed to the Wiener first passage time distribution functions. Specifically:

  • For the bs and ndt parameters, the parameter values are treated as if they are log-transformed, so the inverse of the natural logarithm — the exponential function — is used to transform the value to the \left[0, \infty\right) domain.
  • For the bias parameter, the parameter value is treated as if it is logit-transformed, so the inverse of the logit function — the inverse logit a.k.a. standard logistic a.k.a. sigmoid function — is used to transform the value to the \left[0, 1\right] domain.

Just for completeness, it might be helpful to discuss the default priors on bs, bias, and ndt:

  • The intercept on the starting point is assigned a beta(1, 1) prior by default, which is equivalent to a uniform(0, 1) prior. Thus, this prior distribution is supported on the interval \left[0, 1\right].
  • The intercept on the boundary separation is assigned a gamma(1, 1) prior, which is equivalent to an exponential(1) prior. Thus, this prior distribution is supported on the interval \left[0, \infty\right).
  • The intercept on the non-decision time is assigned a uniform(0, min_Y) prior, where I presume that min_Y is a transformed data variable that brms incorporates in the Stan code under the hood, representing the smallest observed response time. Thus, this distribution is supported on the interval \left[0, \texttt{min_Y}\right].

I would only mention that the prior on the ndt seems quite poor. First, the lower bound of zero is problematic, because we know that theoretically a non-decision time of zero seconds is not possible. For example, we know that the onset latency of early sensory processing is at least 50 ms (Schmolesky et al., 1998). Second, the upper bound of the fastest observed response time is problematic for several reasons. For starters, the non-decision time can, by definition, not be equal to an observed response time, since any given response time should reflect both non-decision time and decision time (assuming the observed response time is indeed generated by an evidence accumulation process). Furthermore, in a hierarchical set-up with several participants, just taking the fastest observed response time across the entire dataset could be problematic, because you might have one specific participant with extremely fast response times, thereby setting the overall minimum observed response time very fast. Presumably this same upper bound will then also be applied to all the other participants, who might be better served by a slower upper bound.

So as a first remedy, I would carefully check your raw data and exclude any implausibly fast response times (say, faster than 200 ms; Ratcliff & Tuerlinckx, 2002). And then closely inspect the raw response time data for each participant, and try to set a more informed prior on the ndt intercept based on your insights. There are also other solutions; for example, Ratcliff & Tuerlinckx proposed a mixture of the Wiener distribution (representing the evidence accumulation process) and a uniform distribution defined from zero to the end of the trial (representing a “contamination” process for capturing both extremely fast and extremely slow response times). But that’s probably out of scope.

Good luck!

1 Like