# Priors for fixed effects on bounded parameters

I’m still pretty new to brms and Stan (and Bayesian inference generally), but I keep running into the same two problems in different contexts. For background, I’ll stick to the analysis I’m currently working on.

I am trying to estimate the three parameters of an exponentially modified gaussian distribution (exgaussian) for reaction times, based on condition (three levels) and group. The model also has individual intercepts and slopes (for condition) for each of the parameters. The results so far suggest to me that this is a pretty good model for the data, in that the distribution of data simulated from the posterior matches well with the observed distributions (per condition/participant), which is essentially what I’m interested in here.

Reaction times are in seconds (I’ve tried dividing them by the standard deviation but that does not seem to make any difference for these problems) and therefore largely fall between 0 and 1 but with a significant fraction of large outliers (max ~7s).

My approach to setting priors so far is to set priors for the intercept based on guesstimates of where I’m 99% sure the value has to lie. I’m using a student_t(3, …) to allow as much as feasible for the possibility that I’m completely wrong. For the fixed effects I’m using priors centered on 0 because I don’t want to encode the direction of these effects in the priors, and then setting the scale to half of that for the intercept priors. For the ‘random effects’ I am using an exponential prior with a high rate to shrink the sd towards 0, but I’ve relaxed the prior for the beta parameter because posterior checks suggested that it was too strong.

So here’s the model

``````modex7.bf <- bf(RT ~ cond * group + (1 + cond|a|id),
sigma ~ cond * group + (1 + cond|a|id),
beta ~ cond * group + (1 + cond|a|id),

get_prior(modex7.bf, data = posRT)

modex7.priors <- set_prior('student_t(3,0.5,0.08)', class = 'Intercept') +
# 99%: 0 to 1
set_prior('student_t(3, -2.5, 0.4)', class = 'Intercept', dpar = 'sigma') +
#99%: -5 to 1
set_prior('student_t(3, 2.7, 0.4)', class = 'Intercept', dpar = 'beta') +
# 99%: 1 to 200* (log = 0 to 5.3)
# *at a rate of 200, the exponential contribution to the exgauss is negligible
set_prior('student_t(3, 0, 0.04)', class = 'b')
set_prior('student_t(3, 0, 0.2)', class = 'b', dpar = 'sigma') +
set_prior('student_t(3, 0, 0.2)', class = 'b', dpar = 'beta') +
set_prior('exponential(25)', class = 'sd',) +
set_prior('exponential(25)', class = 'sd', dpar = 'sigma') +
set_prior('exponential(10)', class = 'sd', dpar = 'beta') # relaxed this prior a bit

modex7.fit <- brm(modex7.bf, data = posRT, sample_prior = 'yes', prior = modex7.priors, iter = 4000, cores = 4, seed = mustard)
``````

So my problems:

1. I get this warning/error from Stan (a lot) while sampling:
``````Chain 4: Rejecting initial value:
Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 4:   Stan can't start sampling from this initial value.
``````

I understand that this is because the starting values that Stan is using are far from posterior mass. However, the documentation for brm suggests that you shouldn’t manually set starting values (“Alternatively, `inits` can be a list of lists containing the initial values, or a function (or function name) generating initial values. The latter options are mainly implemented for internal testing.”) and 0 doesn’t always make sense as a starting value for the parameter either (e.g., sd of a guassian with identity link).
If I understand correctly, in the absence of initial values, Stan generates them on [-2, 2], but in some cases I really have to pull some strange tricks to make all or even most values in the range at all plausible (e.g., the sigma parameter in the above example).

1. I can’t figure out how to set a prior for fixed effects that I believe could plausibly go in both directions (i.e., increase and decrease the parameter) when the parameter itself is bounded (e.g., if sigma in the model above had an identity link instead of a log link).

I strongly suspect that both of these problems can (easily?) be fixed by coding the models directly in Stan, or by using the ‘stanvars’ option in brms, i.e., through some form of reparametrization. However, that would require me to learn Stan syntax, something that is pretty high on my to-do list but not high enough that it provides a solution for this particular case. Are their any alternatives?

• Operating System: Window 10 Enterprise
• brms Version: 2.12.0

The goal of that initialization is to have chains start dispersed around the solution.

Rhat (how we judge convergence) is based on checking if a bunch of different MCMC chains get the same solution.

If all the MCMC chains start in the same place, you might think things have converged before they have.

So it’s fine to change your initialization, just keep that in mind. Nothing says [-2, 2] is even a good initialization, but you do lose diagnostic power if you start everything in one spot.

If the problem is just those warning messages at the beginning of sampling, I’d just ignore them :D. Or are chains sometimes failing to ever initialize or something? If so it’s fair to adjust the inits.

1 Like

Ok, that makes sense - but in that case it seems like it would even be preferable to supply several initial values (rather than starting all chains at 0, which is the alternative suggested by the documentation). Or am I missing something?

Not, the negative infinity thing doesn’t ever lead to complete failure, at least not in my current analysis.

However, the second problem does usually cause Stan to fail. I guess there the solution is to stick to a log (or logit) link function so the problem doesn’t occur. For obvious reasons that’s not always optimal though.

Thanks a lot for the help!

Usually we recommend just making the range smaller to start with (for convenience). Like [-1, 1] is will probably work more often than [-2, 2], all the way down to something like zero. We aren’t really checking that the initial points are actually dispersed relative to the posterior though, so it’s all more about a convenient thing that works most of the time (as far as we know!)

Do you have some example code of trying to do this? Certainly seems like we should be able to set priors here.

Well, the closest thing would be something very similar to the example above, but with an identity link for sigma, i.e.:

``````modex8.bf <- bf(RT ~ cond * group + (1 + cond|a|id),
sigma ~ cond * group + (1 + cond|a|id),
beta ~ cond * group + (1 + cond|a|id),
# note change of sigma link

get_prior(modex8.bf, data = posRT)

modex8.priors <- set_prior('student_t(3,0.5,0.08)', class = 'Intercept') +
# 99%: 0 to 1
set_prior('gamma(3,18)', class = 'Intercept', dpar = 'sigma') +
# strictly positive prior for sigma intercept
# 99%: 0.02 to 0.50
set_prior('student_t(3, 2.7, 0.4)', class = 'Intercept', dpar = 'beta') +
# 99%: 1 to 200* (log = 0 to 5.3)
# *at a rate of 200, the exponential contribution to the exgauss is negligible
set_prior('student_t(3, 0, 0.04)', class = 'b')
# narrowly ranged prior for the fixed effects - still doesn't work
set_prior('normal(0, 0.05)', class = 'b', dpar = 'sigma') +
set_prior('student_t(3, 0, 0.2)', class = 'b', dpar = 'beta') +
set_prior('exponential(25)', class = 'sd',) +
set_prior('exponential(25)', class = 'sd', dpar = 'sigma') +
set_prior('exponential(10)', class = 'sd', dpar = 'beta') # relaxed this prior a bit
# for now I'm setting strong (shrinkage) priors on the random effects - I may relax this later

modex8.fit <- brm(modex8.bf, data = posRT, sample_prior = 'yes', prior = modex8.priors,
cores = 4, seed = mustard)
``````

Running this with the actual data gives this error in R/brms:

``````some chains had errors; consider specifying chains = 1 to debughere are whatever error messages were returned
[]
Stan model 'a40695383af7f59f14575f7e0225930d' does not contain samples.

[]
Stan model 'a40695383af7f59f14575f7e0225930d' does not contain samples.

[]
Stan model 'a40695383af7f59f14575f7e0225930d' does not contain samples.

[]
Stan model 'a40695383af7f59f14575f7e0225930d' does not contain samples.
``````

I’ve since realized that in fact random effects are even more problematic than fixed effects. If I make the `class = 'b', dpar = 'sigma'` prior on the fixed effect narrow enough (e.g. `normal(0, 0.01)`) I can usually get it to run on at least some of the chains, despite many errors. But even `exponential(250)` still leads to failure to initialize for the `class = 'sd', dpar = 'sigma'` prior on the random effect.

The chains themselves show these kinds of warnings:

``````Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: exp_mod_normal_lpdf: Scale parameter is -2.55873, but must be > 0!  (in 'model38f01cc86831_a40695383af7f59f14575f7e0225930d' at line 122)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: exp_mod_normal_lpdf: Scale parameter is -1.8258, but must be > 0!  (in 'model38f01cc86831_a40695383af7f59f14575f7e0225930d' at line 122)
``````

And then in the end:

``````Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
 "Error in sampler\$call_sampler(args_list[[i]]) : Initialization failed."
 "error occurred during calling the sampler; sampling not done"
``````

The link functions guarantee constraints, so the sigma parameter in exgaussian needs to be an exp to avoid the warnings you’re getting. This:

``````Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: exp_mod_normal_lpdf: Scale parameter is -2.55873, but must be > 0!  (in 'model38f01cc86831_a40695383af7f59f14575f7e0225930d' at line 122)
``````

Means that the constraint on the sigma parameter (the scale parameter) has been violated. That distribution is here: https://mc-stan.org/docs/2_22/functions-reference/exponentially-modified-normal-distribution.html

Yeah, that’s what I was getting at. The link function does also have some other effects though, especially with continuous predictors, which is why I was looking for a way to do this without using a log link. I guess if need be I could transform the predictor to get rid of the most obvious problem with this solution.

Again, thanks a lot for the help!

Yeah interpreting the priors becomes a little more annoying, since it’s no linear (X units of predictor corresponds to an average increase of Y units of output, or whatever).