Thanks so much for your input. Really helpful.
First of all: Unfortunately, I get similar issues if sigma is fixed. I do not get it in all chains but in the fourth in this case. I think I overlooked this months ago because brms did not always print these messages. I tried the same model with gaussian family and the the issue disappears (at least this time) but therefore I get divergent transitions again.
I don’t think this is a strong prior, for normalized predictors, I often use prior(normal(0,1), class = sd)
and it seems to work OK (but that also depends on the scale of the outcomes).
Does brms automatically use a half gaussian then for the sd parameter (because that paramter is always positive, right?)? Or is does brms just like with sigma model it as log(sd)
? I need to make sure I do not misunderstand something now: If I plot that prior distribution:
qplot(rexp(35, n = 1e4), geom = "density")
vs. e.g. the normal you use, then the rexp(35)
prior makes values higher than 0.10 already extremely unlikely so this is a stronger prior than the normal I thought.
About convergence: Yes, I always made sure models converged. But I only managed to get that with the skew_normal so far and that rexp prior on the sd parameter.
If sigma
is the main issue, you might consider keeping it semi-fixed, i.e. putting very narrow priors on coefficients affecting sigma
(except for the intercept). Note that sigma
is fit on the log scale so as a rule of thumb when you have 6 coefficients, each constrained to normal(0,1)
, the model considers sigmas up to exp(6 * 1.96) ~= 128027
as a priori plausible and up to exp(6) ~= 403
as reasonably likely.
So, in my case, when it comes to the prior for sigma, I should pick much stronger prior such as maybe normal(0, 0.25)
. Or/and I could limit it to 1-2 coefficients maybe (actually that is the main idea of the Kruschke paper anyway so that the assumption of homogeneity of variance between the groups of interest is relaxed)…
EDIT: Indeed if I use strong priors here and reduce the coefficients the problem of unrealistic expectations disappears!
I will try a bit more to figure why I do only get the issue with the skew_normal
and not the gaussian…