# Difference between conditional_effects plot and predicted probability plot

Hi everyone,

I’m working on a longitudinal analysis of a binary outcome across 4 intervention arms using a GLMM and I am confused about the response values shown in the plots generated by conditional_effects().

This is the plot of predicted probabilities from predict():

The numbers roughly match an earlier plot I made when looking at the outcome at each follow-up visit. The percentage of the outcome event for the AMMI + PS + Coach group goes above 20% and the predicted probability in the last plot goes above 0.20.

However, when I use conditional_effects(), I get the following plot where the magnitude of the response is much smaller. (the x-scale is different, but the result is the same if transform back to days since baseline)

Is conditional_effects() giving me predicted probabilities or something else? I’m also thinking it might be due to differences in handling the random effect or the smoothing I apply in the first plot.

Here is my code:

``````# code for plotting raw data at each visit
d %>% group_by(VISIT, arm, PREP_RE) %>% summarise(n = n(), .groups="drop_last") %>%
mutate(`%` = n/sum(n) * 100) %>%
filter(PREP_RE == 1) %>%
ggplot(aes(x = VISIT, y = `%`, group = arm, color = arm)) +
geom_line() + geom_point()

# model
mod_quad = brm(PREP_RE ~ arm*scaled_time + (1|ENROLLMENT_ID) + I(scaled_time^2),
data = d, cores = 4, iter = 2000,
family = bernoulli(link = "logit"))

# conditional effects plot

# get predicted probabilities
d2 = d %>%
filter(!is.na(PREP_RE)) %>%
mutate(predicted = predict(mod_quad, type = "response")[ , 1])

# plot predicted probability
ggplot(d2, aes(x = time, y = predicted, color = arm)) +
geom_smooth(se = F) +
labs(x = "Days since baseline", y = "predicted probability")
``````
• Operating System: Windows 10
• brms Version: 2.15.0

Welcome. Thanks for the detailed posting and sorry about they delay.

These are two different things. One is giving you predictions based on the fitted model (predict) and the other (conditional_effects) shows the effects of categorical or numeric predictors. I tend to think of these as what could be given some future data (predict) and what is happening right now (conditional_effects) given the current data and model.

`predict` (which is a wrapper around `posterior_predict`) returns the posterior mean and 95% credible interval for each data point, including uncertainty due to residual error. (Also, I don’t think the predict “method” for brmsfit objects has a `type` argument, but returns predictions on the response scale by default; see the help for `predict.brmsfit`).

`conditional_effects`, on the other hand, by default uses `posterior_epred` (this can be changed with the `method` argument), which returns the expected (fitted) values from the posterior distribution (that is, excluding uncertainty due to residual error). (FYI, just as you can use `predict` as a wrapper around `posterior_predict`, you can also use `fitted` as a wrapper around `posterior_epred`.) The uncertainty bands with `conditional_effects` should therefore be narrower than those returned by `predict` on the same data values.

For a given predictor variable, `conditional_effects` sets other fixed effects to their mean (for numeric) or reference level (for categorical). `conditional_effects` actually returns a list containing a data frame for each predictor and interaction. These data frames are then used to generate the `conditional_effects` plots. You can see the structure of the returned object if you do:

``````ce = conditional_effects(mod_quad)
str(ce)
``````

One thing to be aware of is that by default `conditional_effects` excludes random effects when calculating predictions. You can include random effects by setting `re_formula=NULL` (the default is `re_formula=NA`). On the other hand, `predict/posterior_predict` and `fitted/posterior_epred` include random effects by default but you can exclude them by setting `re_formula=NA`.

I’m not sure why the response values are so different in your various plots. As far as I know, for brmsfit objects, the `predict` and `fitted` functions return predictions on the response scale by default. It’s hard to tell what’s going on without knowing more about `d` and `d2`. Also, using the predicted value for each data point (from the `predict` function) and then `geom_smooth` to average over those averages doesn’t seem kosher. I think you need to work with the raw posterior draws and then calculate some sort of “average marginal effect” but that one is above my pay grade. Hopefully one of the experts will chime in.

2 Likes

Thanks for both of your responses!

I think the issue is due to whether I take into account the random effects. If I set `re_formula=NA` in my `predict` call, then I get results similar to what I am seeing with `conditional_effects()`.

This is a good suggestion and I think I really should have been trying to work with the raw samples in the first place. I was avoiding this approach initially because we had several different models that would have been a headache to code this for. Now that we’ve paired down our list of models, I think this is the right way to go.

2 Likes