Proper setup of priors for a drift difusion model

pbranc · January 15, 2025, 11:14pm

I am working an adrift diffusion model and am running into an issue setting up priors.I recently made a post where, with the help of another forum user, I’ve decided my model formula should look like this:

formula <- brms::bf(
  # drift rate formula
  rt | dec(response) ~ x1+ x2+ (x2|p|subID) + (1|Face),
  # boundary separation formula
  bs ~ 1 + (1|p|subID),
  # non-decision time formula
  ndt ~ 1 + (1|p|subID),
  # starting point formula
  bias ~ 1 + (1|p|subID)
)

Where x1 is a continuous predictor and x2 is a categorical (with three levels, treatment-coded) of drift rate, and all other parameters are fixed.

To set up my priors, I decided to attempt using the default priors for most, specifying priors only for x1 and x2. My code is:

#What do the default priors look like?
get_prior(formula,
         data = data,
         family = wiener(link_bs = "identity",
                         link_ndt = "identity",
                         link_bias = "identity"))

The output is the following table:

Prior	class	coef	group	dpar	lb	source
	b					default
	b	x2level2				default
	b	x2level3				default
	b	x1				default
lkj(1)	cor					default
	cor		subID			default
student_t(3, 869, 431.4)	Intercept					default
student_t(3, 0, 431.4)	sd				0	default
	sd		Stimulus			default
	sd	Intercept	Stimulus			default
	sd		subID			default
	sd	x2level2	subID			default
	sd	x2level3	subID			default
	sd	Intercept	subID			default
beta(1, 1)	Intercept			bias		default
student_t(3, 0, 431.4)	sd			bias	0	default
	sd		subID	bias		default
	sd	Intercept	subID	bias		default
gamma(1, 1)	Intercept			bs		default
student_t(3, 0, 431.4)	sd			bs	0	default
	sd		subID	bs		default
	sd	Intercept	subID	bs		default
uniform(0, min_Y)	Intercept			ndt		default
student_t(3, 0, 431.4)	sd			ndt	0	default
	sd		subID	ndt		default
	sd	Intercept	subID	ndt		default

I noticed it did not establish priors for x1 or x2, so I wrote more code to include those:

family <- wiener(link_bs = "identity",
                 link_ndt = "identity",
                 link_bias = "identity")

prior <- c(
  set_prior("normal(-5, 5)", coef = "x2level1", class="b"),
  set_prior("normal(-5, 5)", coef = "x2level2", class="b"),
  set_prior("normal(-5, 5)", coef = "x1", class = "b")
) |>  brms::validate_prior(formula,
                           family = family,
                           data = data)

make_stancode(formula,
              family = wiener(link_bs = "identity",
                              link_ndt = "identity",
                              link_bias = "identity"),
              data = data,
              prior = prior)

When I try to create the stan code, I get the following error:

Warning messages:
1: It appears as if you have specified a lower bounded prior on a parameter that has no natural lower bound.
If this is really what you want, please specify argument 'lb' of 'set_prior' appropriately.
Warning occurred for prior 
Intercept_bias ~ beta(1, 1)
Intercept_bs ~ gamma(1, 1)
Intercept_ndt ~ uniform(0, min_Y)
 
2: It appears as if you have specified an upper bounded prior on a parameter that has no natural upper bound.
If this is really what you want, please specify argument 'ub' of 'set_prior' appropriately.
Warning occurred for prior 
Intercept_bias ~ beta(1, 1)
Intercept_ndt ~ uniform(0, min_Y)

Obviously, some of my priors are not logical. I’m unfamiliar with Bayesian statistics in general, so I am still trying to get the hang of establish priors, in addition to being new to DDM and stan. I have little help outside of this forum, so any assistance is greatly appreciated!

frank_hezemans · January 16, 2025, 8:58am

First, this paper by Tran et al. (2021) is a useful resource for setting priors on DDM parameters.

Regarding your priors on the effects of x1 and x2 on the drift rate: The output of get_prior shows you that for the class “b” (i.e., regression coefficients on the drift rate), you can specify priors on the coefficients x2level2, x2level3, and x1. Since you used treatment coding for x2, the coefficient x2level1 is implicitly taken as the reference value, so that is actually the Intercept term. I don’t know whether it makes sense for you to treat level1 as the reference level; you could also try other coding schemes such as a sum-to-zero contrast.

You set priors on x2level1, x2level2, and x1. But based on the above info, you should actually set priors on x2level2, x2level3, and x1. To be honest I’m not sure what would happen to your specified prior for x2level1, I suspect it would just be ignored.

Your priors on these effects are quite strongly negative and flat (mean = -5, SD = 5). I would have typically expected smaller effects on the drift rate. Assuming your RT data is on the scale of seconds, the drift rate represents the mean amount of “evidence” accumulated (in arbitrary units) per second. Anecdotally, I generally find the drift rate is anywhere from 0.5 to 6 or so (depending widely on the study at hand of course). So I would expect any effects on the drift rate to be anywhere in the -2 to 2 range, just as a ballpark.

Next, regarding the warning message: This occurs because you used the "identity" linking function for each of the boundary separation, non-decision time, and starting point parameters. This just means that you are directly estimating the values of these parameters by sampling from the real line \left(-\infty, \infty\right), without any bounds or transformations applied before plugging these parameter values into the Wiener first passage time distribution function.

The problem is that the default priors on the intercepts of the starting point, boundary separation, and non-decision time are all lower-bounded distributions (and in the case of the starting point and non-decision time, also upper-bounded) — suggesting that the parameters themselves should also be bounded accordingly — but as I explained above, due to the use of the "identity" link, these parameters are in fact not bounded accordingly.

If you look at the relevant brms source code, you’ll see that the default settings are as follows:

wiener <- function(link = "identity", link_bs = "log",
                   link_ndt = "log", link_bias = "logit") {
  slink <- substitute(link)
  .brmsfamily("wiener", link = link, slink = slink,
              link_bs = link_bs, link_ndt = link_ndt,
              link_bias = link_bias)
}

Since you are not setting any specific priors on these parameters yourself, I would follow the default settings (so no need to bother tweaking link_bs, link_ndt, and link_bias in your call to wiener).

To explain further how these linking functions work, note that all parameters are sampled on the unconstrained real line \left(-\infty, \infty\right). For parameters with the ”log” and “logit” linking functions, these unconstrained parameter values are treated as “transformed” values that still need to be “back-transformed” to an appropriate space before they can be passed to the Wiener first passage time distribution functions. Specifically:

For the bs and ndt parameters, the parameter values are treated as if they are log-transformed, so the inverse of the natural logarithm — the exponential function — is used to transform the value to the \left[0, \infty\right) domain.
For the bias parameter, the parameter value is treated as if it is logit-transformed, so the inverse of the logit function — the inverse logit a.k.a. standard logistic a.k.a. sigmoid function — is used to transform the value to the \left[0, 1\right] domain.

Just for completeness, it might be helpful to discuss the default priors on bs, bias, and ndt:

The intercept on the starting point is assigned a beta(1, 1) prior by default, which is equivalent to a uniform(0, 1) prior. Thus, this prior distribution is supported on the interval \left[0, 1\right].
The intercept on the boundary separation is assigned a gamma(1, 1) prior, which is equivalent to an exponential(1) prior. Thus, this prior distribution is supported on the interval \left[0, \infty\right).
The intercept on the non-decision time is assigned a uniform(0, min_Y) prior, where I presume that min_Y is a transformed data variable that brms incorporates in the Stan code under the hood, representing the smallest observed response time. Thus, this distribution is supported on the interval \left[0, \texttt{min_Y}\right].

I would only mention that the prior on the ndt seems quite poor. First, the lower bound of zero is problematic, because we know that theoretically a non-decision time of zero seconds is not possible. For example, we know that the onset latency of early sensory processing is at least 50 ms (Schmolesky et al., 1998). Second, the upper bound of the fastest observed response time is problematic for several reasons. For starters, the non-decision time can, by definition, not be equal to an observed response time, since any given response time should reflect both non-decision time and decision time (assuming the observed response time is indeed generated by an evidence accumulation process). Furthermore, in a hierarchical set-up with several participants, just taking the fastest observed response time across the entire dataset could be problematic, because you might have one specific participant with extremely fast response times, thereby setting the overall minimum observed response time very fast. Presumably this same upper bound will then also be applied to all the other participants, who might be better served by a slower upper bound.

So as a first remedy, I would carefully check your raw data and exclude any implausibly fast response times (say, faster than 200 ms; Ratcliff & Tuerlinckx, 2002). And then closely inspect the raw response time data for each participant, and try to set a more informed prior on the ndt intercept based on your insights. There are also other solutions; for example, Ratcliff & Tuerlinckx proposed a mixture of the Wiener distribution (representing the evidence accumulation process) and a uniform distribution defined from zero to the end of the trial (representing a “contamination” process for capturing both extremely fast and extremely slow response times). But that’s probably out of scope.

Good luck!

pbranc · January 16, 2025, 5:36pm

Thank you! These are great resources and great advice.

(and yes, there was a typo- I set priors for levels 2 and 3 for my treatment-coded factor, not level 1, because it is the reference group as you pointed out. Thanks for catching that!)

frank_hezemans · January 16, 2025, 6:56pm

Glad it was helpful! Makes sense that the level1 / level2 thing was a typo.
FYI I’ve edited my answer to add a paragraph about the linking functions, which I previously hadn’t explained clearly.

Topic		Replies	Views
Diffusion model initialization issue using brms Modeling rstan , brms , initialization	6	688	May 6, 2025
Error: the model does not contain samples General rstan , fitting-issues , brms	4	120	January 26, 2025
Across trial drift-rate variability in the (hierarchical) drift diffusion model Modeling fitting-issues , cognitive-science	7	1147	February 5, 2019
Diffusion drift model (ddm) fitting issues with rstan Modeling fitting-issues , cognitive-science	6	3272	November 26, 2020
Priors for fixed effects on bounded parameters brms	7	2538	April 6, 2020

Proper setup of priors for a drift difusion model

Related topics