Plotting custom predictor values for a multilevel model with a categorical variable

Hi,
First of all, thanks a lot for the implementation of this package (@paul.buerkner) and all incredible books and posts (@Solomon ) that are available for newbies like me in the field of probabilistic modelling, this is amazing!

I would like to reproduce Figure 4.10 from Solomon Kurz’s book: Statistical rethinking with brms, ggplot2, and the tidyverse (original Fig 4.8 of Statistical rethinking). The idea is to plot the joint consequence of μ AND σ with predict(), but in this case, for a multilevel model in which time is a categorical variable for seven time points. I found very useful the examples of the book with the quadratic and cubic model version and apply them to my multilevel data:

Example data:

t ← (rep(0:6,10))
ab ← ((runif(1, 10, 20) * t) / (runif(1, 0, 10) + t)) + rnorm(10, 0, 1)
plot(t, ab,pch=19,las = 1)
X ← as.data.frame(cbind(t,ab));
X$t_fact ← as.factor(t)
X$id ← paste(“id”,rep(0:6,each=10),sep="_")

Cubic model

fitCub ←
brm(data = X,
family = gaussian,
ab ~ 0 + Intercept + t + I(t^2) + I(t^3) + (1 | id),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 9,
file = “Ex_cubic”)

Data to evaluate predictions:

time_seq ←
tibble(t = seq(from = 0, to = 6, length.out = 40)) %>%
mutate(ItE2 = t^2) %>%
mutate(ItE3 = t^3)
id<-unique(X$id)
time_seq ← time_seq %>%
expand(time_seq,id)

Fitted values

fitd_cub ← fitted(fitCub, newdata = time_seq) %>%
as_tibble() %>%
bind_cols(time_seq)

Predicted responses

pred_cub ← predict(fitCub, newdata = time_seq) %>%
as_tibble() %>%
bind_cols(time_seq)

Plot the data

ggplot(data = X, aes(x = t)) +
geom_ribbon(data = pred_cub, aes(ymin = Q2.5, ymax = Q97.5),fill = “#cfd8e6”) +
geom_smooth(data = fitd_cub, aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = “identity”, fill = “#a6a3ff”, color = “darkblue”, alpha = .5) +
geom_point(aes(y = ab),color = “navyblue”, shape = 19, size = 1.5, alpha = 1/3) +
coord_cartesian(xlim = range(X$t),ylim = range(X$ab)) +
theme_bw()

However, the final aim is to obtain the predicted values for the next model, in which the time is treated as a categorical variable, with a coefficient per time point:

Model

fitGT ←
brm(data =X,
family = gaussian,
formula = ab ~ 0 + Intercept + t_fact + (1| id),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
control = list(adapt_delta = 0.9),
seed = 9,
file = “Ex_GT”)

fitted

fitd_GT ← fitted(fitGT) %>%
as_tibble() %>%
bind_cols(X)

predict

pred_GT ← predict(fitGT) %>%
as_tibble() %>%
bind_cols(X)

Plot the data

ggplot(data = X, aes(x = t)) +
geom_ribbon(data = pred_GT, aes(ymin = Q2.5, ymax = Q97.5),fill = “#cfd8e6”) +
geom_smooth(data = fitd_GT, aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = “identity”, fill = “#a6a3ff”, color = “darkblue”, alpha = .5) +
geom_point(aes(y = ab),color = “navyblue”, shape = 19, size = 1.5, alpha = 1/3) +
coord_cartesian(xlim = range(X$t),ylim = c(-10,10)) +
theme_bw()

In this case, I wonder if, given the nature of time as a factor, it is only possible to get the values for the time points already defined in the model, as is done in the example above. I really would like to obtain “smoother” intervals. Is there any way to get that? Is the code properly implemented?

Any help/guidance much appreciated!

  • Operating System: ubuntu
  • brms Version: brms_2.14.4

Hi @aalvarezf

Welcome to the Stan forums!

I’ve worked through your example and I think I understand what you’re asking. If you’re modeling time as a factor, I’m not aware of a good way to plot the resutls with smoother intervals. I’d even go further to suggest that it might come off as misleading to use smoother intervals, in such a case. IMO, the chunkiness of the intervals in your second plot is a faithful representation of the categorical handling of time in your fitGT model.

1 Like

Ok, understood, it is what it is and the smoothing may not be representative for this model.
Thanks for the quick answer =)

2 Likes