Between-subject variance when plotting conditional effects

Hi all,

When plotting/extracting conditional effects (e.g., using posterior_epred / conditional_effects) credible intervals for a given effect will include all uncertainty including, for example, the between-subject differences in intercepts. Sometimes such credible intervals are not very informative because the between-subject variance in intercepts is irrelevant for conveying the slope effect that we are interested in.

I am wondering whether it is ok to remove the between-subject variability (with appropriately noting that in figure captions) in these situations? Is there a reason not to do this that I’m overlooking?

Is there a way of doing that within posterior_epred or conditional_effects? I’m sorry if this has an obvious solution, but I haven’t been able to find it looking into the documentation. Otherwise, I guess that the solution is to just manually plot the slopes and the associated group-level credible interval, or to average over the subjects.

Here’s example code in which I compare the credible intervals from conditional_effects and the ones obtained via fitted and then averaging over subjects:

library(lme4)
library(brms)
library(ggplot2)
library(gridExtra)

sleepstudy = sleepstudy

m = brm(
  data = sleepstudy,
  Reaction ~ Days + (Days | Subject),
  family = gaussian,
  iter = 2000,
  chains = 4)

fit = fitted(m)
data = cbind(sleepstudy,fit)
colnames(data) = c("Reaction","Days","Subject","Reaction_pred")

# Plot
p1 = ggplot(data,aes(x=Days,y=Reaction_pred))+
       geom_smooth(method="lm",se=.95) +
       ylim(200,400)+
       ggtitle('Average out the subject-level uncertainty')+
       theme_clean()

p = conditional_effects(m,"Days",plot=FALSE)
p2 = plot(p,plot=FALSE)[[1]]+ ylim(200,400)+
       ggtitle('Sample from the posterior')+
       theme_clean()

grid.arrange(p1, p2, ncol=2)

Thank you very much for your help and sorry if this is a repeat question. I’ve seen a bunch of discussion on conditional_effects, but wasn’t able to find something that specifically addressed this issue.

Cheers,
Ivan

By default, conditional_effects excludes variability due to group-level effects. It does this by setting re_formula=NA when calling posterior_epred, which I believe effectively sets the random effects to zero. So I think you’re already excluding between-subject variability in your conditional_effects call. If you wanted to include between-subject variability, you would do conditional_effects(m, "Days", re_formula=NULL), where re_formula=NULL includes variability due to the random effects.

In your code for p1, data is a data frame that contains fitted values (Reaction_pred) for each Subject for each Day, where there are 18 subjects and 10 days. fitted by default runs posterior_epred with re_formula=NULL, so these fitted values do include the subject-specific random intercepts and slopes (if the setting was re_formula=NA, the random effects would be excluded and the fitted values for a given Day would be exactly the same for all 18 subjects). geom_smooth is therefore running a regression of Reaction_pred ~ Day, in effect giving the average of the 18 subject-specific fitted values for each day and the 95% CI for that daily average. This is sort of an average of averages and greatly underestimates the actual uncertainty in the fitted values.

I’ve included a reproducible example below that makes explicit what geom_smooth is doing and then includes conditional_effects plots that exclude and include variability due to the group-level effects.

library(lme4)
library(brms)
library(tidyverse)
library(patchwork)
theme_set(ggthemes::theme_clean() +
            theme(plot.title=element_text(size=rel(0.82))))

# Data frame for modeling
d = sleepstudy

m = brm(
  data = d,
  Reaction ~ Days + (Days | Subject),
  family = gaussian,
  iter = 2000, cores=4, chains = 4)

# Add model predictions to data
d = cbind(d, fitted(m))

# Duplicate what geom_smooth is doing
mm = lm(Estimate ~ Days, data=d)
days = data.frame(Days=unique(sleepstudy$Days))
dlm = predict(mm, newdata=days, se.fit=TRUE, interval="confidence")
dlm = data.frame(days, dlm$fit)

p1 = ggplot(d, aes(x=Days, y=Estimate)) +
  geom_smooth(method="lm", se=.95) +
  geom_line(data=dlm %>% pivot_longer(-Days),
            aes(group=name, y=value),
            colour="orange", linetype="33", linewidth=1) +
  labs(title='Average out the\ngroup-level variability') +
  coord_cartesian(ylim=c(200,400))

p2 = conditional_effects(m, "Days", plot=FALSE)
p2 = plot(p2 ,plot=FALSE)[[1]] +
  coord_cartesian(ylim=c(200,400)) +
  labs(title='posterior_epred\nexcluding group-level\nvariability')

p3 = plot(conditional_effects(m, "Days", re_formula=NULL), plot=FALSE)[[1]] +
  coord_cartesian(ylim=c(200,400)) +
  labs(title='posterior_epred\nincluding group-level\nvariability')

p1 + p2 + p3

Created on 2023-05-16 with reprex v2.0.2

Hi Joels,

Thank you very much for the quick reply and for the code! It makes total sense that the default conditional_effects excludes random effects.

I think that I haven’t formulated the question very well in the original post. Regardless of the random effects, the uncertainty in the intercept for the group-level estimate (reaction times in the example code) will include a lot of between-subject variance, which is normal. For example, if you just look at the group-level parameter estimates and the credible intervals for the model in the example, you will see that the CrIs for the intercept approximately correspond to the CrIs on the middle plot. On the other hand, the CrIs for the slope will be much smaller and approximately correspond to the left plot.

Given that in this example we want to communicate via the plot what the slope effect is (and how uncertain we are about this effect), I am wondering how informative the uncertainty on the middle and the right plot are. For example, the posterior probability that the slope is smaller than 0 is 0 in this model. However, the plot on the right communicates that there is a lot of uncertainty. The middle one less so, but in practice, with RT data, I often see effects which are statistically robust, but these types of plots obscure that a bit via the uncertainty generated via the group-level intercept. Does that make sense?

Again, thank you so much for engaging with the question and sending your code. I really appreciate it.

Best,
Ivan

1 Like