When attempting to vary the shape parameter in multilevel models, I seem to be getting extreme Rhats and very low effective samples. This often then transpires in extreme predictions and ones that do not appear to relate well to the data on which they’re trained.

To try to diagnose the root of the problem, I’ve created simplified versions of my problem and still run in to the same issues. There could easily be something I’m doing wrong with the way I’m constructing the model.

Here is a reproducible example, which I’ve tried to make as simple as possible, and one where I wouldn’t expect convergence issues:

n <- 100
Group1 <- data.frame(Group = "Group1", y = round(rnbinom(n, size = 10, mu = 20)))
Group2 <- data.frame(Group = "Group2", y = round(rnbinom(n, size = 1, mu = 15)))
df <- rbind(Group1, Group2)
df$x1 <- df$y + rnorm(n, 5, 3)
mod <- brm(bf(y ~ x1 + (1|Group),
shape ~ (1|Group)),
family = negbinomial(), data = df)

summary(mod) then typically displays effective sample sizes of approximately 2 and Rhats > 1000, across all parameters.

If I do not let the shape parameter vary:

mod <- brm(y ~ x1 + (1|Group))
family = negbinomial(), data = df)

I do not get the same convergence problems.

What could I be doing wrong? I’ve tried referring to this vignette but without solving my problem.

Unfortunately not, I also tried 3 groups and same problem.

The actual dataset in which I originally encountered this problem also has 100+ groups, each with 30+ samples.

I haven’t however looked into setting priors (except for the default uninformative ones), but instinctively I don’t have a lot of prior information that I could add to the model.

It just surprises me that the model doesn’t even come close to converging as it is.

I think you probably know a bit more than you think. Are parameter estimates on the order of a trillion equally plausible a priori as smaller estimates? Even encoding “weak” information like that in your prior can help. Essentially you’re softly limiting the search space to regions that aren’t a priori absurd, which can have noticeable computational benefits.

Thanks Jonah, it sounds like it might have something to do with the default priors giving too much weight to values that will generate extreme dispersion values in the negative binomial distribution. As described halfway down this page: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

I know very little about the process for selecting appropriate priors, and my limited stats background makes it hard for me to understand a lot of the supporting text, but if I find a solution and/or prior that enables the above reproducible example to converge I’ll post it here.