I’m working on a longitudinal analysis of a binary outcome across 4 intervention arms using a GLMM and I am confused about the response values shown in the plots generated by conditional_effects().
This is the plot of predicted probabilities from predict():
The numbers roughly match an earlier plot I made when looking at the outcome at each follow-up visit. The percentage of the outcome event for the AMMI + PS + Coach group goes above 20% and the predicted probability in the last plot goes above 0.20.
However, when I use conditional_effects(), I get the following plot where the magnitude of the response is much smaller. (the x-scale is different, but the result is the same if transform back to days since baseline)
Is conditional_effects() giving me predicted probabilities or something else? I’m also thinking it might be due to differences in handling the random effect or the smoothing I apply in the first plot.
Here is my code:
# code for plotting raw data at each visit
d %>% group_by(VISIT, arm, PREP_RE) %>% summarise(n = n(), .groups="drop_last") %>%
mutate(`%` = n/sum(n) * 100) %>%
filter(PREP_RE == 1) %>%
ggplot(aes(x = VISIT, y = `%`, group = arm, color = arm)) +
geom_line() + geom_point()
mod_quad = brm(PREP_RE ~ arm*scaled_time + (1|ENROLLMENT_ID) + I(scaled_time^2),
data = d, cores = 4, iter = 2000,
family = bernoulli(link = "logit"))
# conditional effects plot
# get predicted probabilities
d2 = d %>%
mutate(predicted = predict(mod_quad, type = "response")[ , 1])
# plot predicted probability
ggplot(d2, aes(x = time, y = predicted, color = arm)) +
geom_smooth(se = F) +
labs(x = "Days since baseline", y = "predicted probability")
- Operating System: Windows 10
- brms Version: 2.15.0
Thanks in advance
Welcome. Thanks for the detailed posting and sorry about they delay.
These are two different things. One is giving you predictions based on the fitted model (predict) and the other (conditional_effects) shows the effects of categorical or numeric predictors. I tend to think of these as what could be given some future data (predict) and what is happening right now (conditional_effects) given the current data and model.
predict (which is a wrapper around
posterior_predict) returns the posterior mean and 95% credible interval for each data point, including uncertainty due to residual error. (Also, I don’t think the predict “method” for brmsfit objects has a
type argument, but returns predictions on the response scale by default; see the help for
conditional_effects, on the other hand, by default uses
posterior_epred (this can be changed with the
method argument), which returns the expected (fitted) values from the posterior distribution (that is, excluding uncertainty due to residual error). (FYI, just as you can use
predict as a wrapper around
posterior_predict, you can also use
fitted as a wrapper around
posterior_epred.) The uncertainty bands with
conditional_effects should therefore be narrower than those returned by
predict on the same data values.
For a given predictor variable,
conditional_effects sets other fixed effects to their mean (for numeric) or reference level (for categorical).
conditional_effects actually returns a list containing a data frame for each predictor and interaction. These data frames are then used to generate the
conditional_effects plots. You can see the structure of the returned object if you do:
ce = conditional_effects(mod_quad)
One thing to be aware of is that by default
conditional_effects excludes random effects when calculating predictions. You can include random effects by setting
re_formula=NULL (the default is
re_formula=NA). On the other hand,
fitted/posterior_epred include random effects by default but you can exclude them by setting
I’m not sure why the response values are so different in your various plots. As far as I know, for brmsfit objects, the
fitted functions return predictions on the response scale by default. It’s hard to tell what’s going on without knowing more about
d2. Also, using the predicted value for each data point (from the
predict function) and then
geom_smooth to average over those averages doesn’t seem kosher. I think you need to work with the raw posterior draws and then calculate some sort of “average marginal effect” but that one is above my pay grade. Hopefully one of the experts will chime in.
Thanks for both of your responses!
I think the issue is due to whether I take into account the random effects. If I set
re_formula=NA in my
predict call, then I get results similar to what I am seeing with
This is a good suggestion and I think I really should have been trying to work with the raw samples in the first place. I was avoiding this approach initially because we had several different models that would have been a headache to code this for. Now that we’ve paired down our list of models, I think this is the right way to go.