Brms GAM intercept too low


I’m using brms to fit a GAM with 8 covariates, only one smooth term, and all seems well and good in terms of model performance. But when I plot the resulting smooth term together with my raw data the intercept seems to sometimes (not always!) be too low; see the attached image where the purple lines (model) are very low relative to the points which are binned averages of raw data.

It’s a mixed effects model where there’s a lot of dispersion, and it’s all positive data that is being modelled with a lognormal family.

brms call, simplified and in generic terms:
tmp.mod ← brm(formula = value ~ s(x1, bs = ‘tp’) + x2 + x3 + (1 | Area),
data = dat, family = lognormal(),
warmup = 1000, iter = 4000, chains = 4, cores = 4,
prior = tmp.priors,
control = list(adapt_delta = 0.99, max_treedepth = 12))

priors (not sure if they matter in this case):
tmp.priors ← c(set_prior(‘normal(0, 0.5)’, class = ‘b’),
set_prior(‘normal(-3, 2.5)’, class = ‘Intercept’),
set_prior(‘normal(0.5, 0.15)’, class = ‘sigma’))

It doesn’t matter whether calling conditional_effects or extracting the outcome using epred_draws, when holding x2 and x3 at their mean, and re_formula = NA. I’ve tried using the formula above as well as including “0 + Intercept” before the other terms.

Is there anyone knows why this is, and whether I’m simply ignorant of something that makes this behaviour explainable/logical or whether there’s something going wrong?

Let me know if I can provide additional information (I’m not allowed to share/upload the data)!

R version 4.2.2
brms 2.20.4
rstan 2.32.3

can you show us your plotting code?

Hi and thanks for the quick reply! Did not expect such a quick one so haven’t checked on the topic until today, but will check in more often from now :)

Yes, of course. In the image in the original post, each colored line is a separate model. So to plot the smooth of (for example) the purple line model, the simplest is just:
conditional_smooths(tmp.mod, effects = ‘x1’)

With the epred_draws function I’ve tried putting the co-variates at both their mean and median level and that only makes a minor difference, i.e. does not solve the problem.

tmp.dat <- bind_cols(lapply(tmp.dat[, c('x2', 'x3')], mean))
tmp.dat$Area <- dat$Area[1]
tmp.dat <- expand_grid(x1 = 1:55, tmp.dat)
tmp.dat <- epred_draws(tmp.mod, newdata = tmp.dat, re_formula = NA, scale = 'response')
tmp.dat <- bind_rows(lapply(1:55, function(x){
    quantile(as.vector(unlist(tmp.dat[tmp.dat$x1 == x, '.epred'])), probs = c(0.025, 0.05, 0.075, 0.5, 0.925, 0.95, 0.975))
colnames(tmp.dat) <- c('lwr95', 'lwr90', 'lwr85', 'avg', 'upr85', 'upr90', 'upr95')

tmp.dat %>% ggplot(aes(x = tSinceCC, y = avg)) + 
     geom_line() + 
     geom_ribbon(fill = NA, linetype = 'dashed', aes(ymin = lwr90, ymax = upr90))

Thankful for any suggestions/ideas! :)

Apparently I cannot edit once more, but I’ve noticed a typo in the code I pasted here. The first row should be:

tmp.dat <- bind_cols(lapply(dat[, c('x2', 'x3')], mean))

It was a typo in my post, not in the original code (I’ve double checked!).