Plotting pp_check interactions

simon.dp · June 30, 2023, 1:27pm

I am trying to model ordinal data with an interaction of two predictors. However, I have issues with plotting interactions and individual data with the BRMS functions.

My model is of the type

rating ~ category*distortion+ (1|id)

I can use this type of code:

pp_check(mcatxdistRDpCexp, type = “bars_grouped”, group = “category”, ,
ndraws = 500, prob = 0.95)

pp_check(mcatxdistRDpCexp, type = “bars_grouped”, group = “distortion”, ,
ndraws = 500, prob = 0.95)

which returns these two plots:

however something like

pp_check(mcatxdistRDpCexp, type = “bars_grouped”, group = “category:distortion”, ,
ndraws = 500, prob = 0.95)

or

pp_check(mcatxdistRDpCexp, type = “bars_grouped”, group = “c(category,distortion)”, ,
ndraws = 500, prob = 0.95)

… does not give me what I want.

It could be very elegant to plot a 4x4 plot with the observed and predicted data for each possible interaction of category and distortion. Essentially, I would like to have something like what conditional effects returns but with the observed data underneath.

conditions ← brms::make_conditions(mcatxdistRDpCexp, vars = c(“category”))
plot(brms::conditional_effects(mcatxdistRDpCexp, ask = FALSE, categorical = T, conditions = conditions))

image1248×746 14.3 KB

I know I can add datapoints using code as the following but I have not been able to get anything similar to work for ordinal data:

plot(testa, points = TRUE) # Add data points to the plot

Thank you for reading. I hope my problem is clearly stated and someone knows a straightforward solution to it.

simon.dp · July 20, 2023, 1:10pm

I am writing to follow up on my previous request for assistance with modeling ordinal data and plotting interactions using BRMS (Bayesian Regression Models using Stan). I have made progress in some areas but still don’t have a perfect solution.

On the positive side, I was able to create a model in the form
df$catDist <- interaction(df$distortion, df$category)
which allowed me to visualize all interactions effectively using:
pp_check(brm3, type = "bars_grouped", group = "catDist ", ndraws = 500, prob = 0.95)

Nonetheless, this model is different from the desired distortion* category model.

Another way, I have tried has been to create predictions for the different interactions myself. However, I have not had too much luck with this compared to the above.

# Step 1: Create the new data frame with all combinations of `imAge`, `category`, and `distortion`
new_data <- expand.grid(
  imAge = levels(df$imAge),
  category = levels(df$category),
  distortion = levels(df$distortion)
)

# Step 2: Get the expected predicted values and uncertainty intervals for each combination
predictions <- posterior_epred(brm2, newdata = new_data, ndraws = 500)

# Step 3: Normalize the probabilities for each combination of `imAge`, `category`, and `distortion`
normalized_probs <- apply(predictions, c(1, 2), function(x) x / sum(x))

# Step 4: Calculate the `predicted_rating`, `Q2.5`, and `Q97.5` within each `imAge`, `category`, and `distortion` combination
new_data <- new_data %>%
  group_by(imAge, category, distortion) %>%
  mutate(
    rating_probs = list(colMeans(normalized_probs[imAge, category, distortion, , drop = FALSE])),
    predicted_rating = sum(seq_len(5) * rating_probs[[1]]), # Calculate the weighted average
    Q2.5 = sum(ifelse(seq_len(5) <= quantile(predictions[imAge, category, distortion, , drop = FALSE], prob = 0.025), rating_probs[[1]], 0)),
    Q97.5 = sum(ifelse(seq_len(5) >= quantile(predictions[imAge, category, distortion, , drop = FALSE], prob = 0.975), rating_probs[[1]], 0))
  ) %>%
  ungroup() # Remove the grouping for the resulting data frame

# Reshape the data for plotting
plot_data <- new_data %>%
  mutate(rating = 1:5) %>%
  pivot_longer(cols = c("Q2.5", "predicted_rating", "Q97.5"), names_to = "interval", values_to = "value") %>%
  mutate(interval = factor(interval, levels = c("Q2.5", "predicted_rating", "Q97.5")),
         interval_label = ifelse(interval == "predicted_rating", "Predicted Rating", "Uncertainty Interval"))

# Step 5: Create the plot with error bars
ggplot(plot_data, aes(x = factor(rating), y = value, fill = imAge, group = interaction(imAge, category, distortion))) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), alpha = 0.7, width = 0.6) +
  geom_errorbar(aes(ymin = Q2.5, ymax = Q97.5), width = 0.2, position = position_dodge(width = 0.9), alpha = 0.5) +
  facet_wrap(~ category + distortion, ncol = 1) +
  theme_classic() +
  labs(x = "Rating", y = "Probability", title = "Predicted Probabilities with Uncertainty Intervals") +
  scale_y_continuous(expand = c(0, 0), limits = c(0, max(plot_data$value) + 0.05)) +
  theme(axis.ticks.x = element_blank(),
        legend.position = "right",
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))

I am thus still facing challenges in visualising the interactions for the desired distortion* category model. Any insights or alternative approaches would be highly appreciated. Please feel free to share any ideas or suggestions to help me move forward.

Best regards,
Simon

jsocolar · July 20, 2023, 4:48pm

@Solomon

Solomon · July 20, 2023, 7:18pm

I’m having a difficult time following what’s going on, here. Perhaps it would help if we used an open data set and you uploaded a picture of the current version of your visualization. How about we use the VerbAgg data set, like Paul did in his IRT paper? However, that data set is a bit on the long side, which makes for long model computation times. So I recommend we subset to quicken things up. Here’s my proposed data set:

# load packages
library(tidyverse)
library(brms)

# load the data
data("VerbAgg", package = "lme4")

# make a random sample of `id` numbers
set.seed(1)
rand_id <- sample(1:316, size = 150) %>% sort()

# subset the data 
d <- VerbAgg %>% 
  # subset to just N = 150 cases
  filter(id %in% rand_id)

With this data set, the focal variable would be resp, and the two categorical predictors could be btype and situ. Here’s a plot of the sample counts.

d %>% 
  ggplot(aes(x = resp)) +
  geom_bar() +
  facet_grid(situ ~ btype)

I believe the model that would correspond to yours would be this:

fit <- brm(
  data = d,
  family = cumulative,
  resp ~ btype * situ + (1 | id),
  seed = 1
)

If this all seems reasonable, where are you at with your visualization?

simon.dp · July 20, 2023, 7:42pm

Thank you for your answer. This looks reasonable, indeed. The next step I would like to add is the model prediction on top of that exact plot.

So far I have only had success with plotting the prediction for one factor or the other using code like:

pp_check(fit , type = “bars_grouped”, group = “btype ”, ndraws = 500, prob = 0.95)

or

pp_check(fit , type = “bars_grouped”, group = “situ”, ndraws = 500, prob = 0.95)

Best, Simon

Solomon · July 20, 2023, 7:56pm

So far, this is the best I have been able to come up with. It still seems a little off, but perhaps that’s just a consequence of the partial pooling you get from a multilevel model:

library(tidybayes)

d %>% 
  add_epred_draws(fit) %>% 
  group_by(btype, situ, .category, .draw) %>% 
  summarise(count = sum(.epred)) %>% 
  group_by(btype, situ, .category) %>% 
  mean_qi(count) %>%  
  rename(resp = .category)  %>% 
  
  ggplot(aes(x = resp)) +
  geom_bar(data = d,
           fill = "grey70") +
  geom_pointrange(aes(y = count, ymin = .lower, ymax = .upper),
                  size = 1/6) +
  facet_grid(situ ~ btype)

Solomon · July 20, 2023, 8:03pm

Oh, and it might also be a consequence of modeling the latent means by the predictors instead of either modeling the latent means AND discrimination parameters, or just modeling the thresholds themselves.

Solomon · July 20, 2023, 8:44pm

Yep, that was it. This basic pp-check method works, but to reproduce the data, you want a model with conditional thresholds instead. Here’s what that looks like:

# new conditional threshold model
fit2 <- brm(
  data = d,
  family = cumulative,
  resp | thres(gr = interaction(btype, situ)) ~ 1 + (1 | id),
  cores = 4, seed = 1
)

# new pp-check
fit2$data %>% 
  add_epred_draws(fit2) %>% 
  group_by(btype, situ, .category, .draw) %>% 
  summarise(count = sum(.epred)) %>% 
  group_by(btype, situ, .category) %>% 
  mean_qi(count) %>%  
  rename(resp = .category) %>% 
  
  ggplot(aes(x = resp)) +
  geom_bar(data = d,
           fill = "grey70") +
  geom_pointrange(aes(y = count, ymin = .lower, ymax = .upper),
                  size = 1/6) +
  facet_grid(situ ~ btype)

simon.dp · July 28, 2023, 12:42pm

Thank you very much for your feedback. It was super helpful. It took a bit of work, but I managed to employ it for my specific case. One thing I will highlight for others that may want to employ this method is that it requires a ton of RAM for a larger dataset (35K rows in my case).

I “solved” this issue by using “thin=15” when specifying the model which reduced the number of post-warmup draws significantly. It still took > 50 Gb of RAM so it may not work for everyone.

Solomon · July 28, 2023, 1:01pm

Another option to reduce the post-warmup draws on the post-processing side is to set the ndraws argument within the add_epred_draws() function to something like 100 or 1000.

PritchardAJP · April 22, 2024, 11:04pm

Thank you for this post all, this was a great starting place for me for similar requirements. Reviving this as I found that similar plots can be generated using bayesplot, which is somewhat more concise.

library(tidyverse)
library(brms)
library(bayesplot

#This is Solomon's code replicated from above
# load the data
data("VerbAgg", package = "lme4")

# make a random sample of `id` numbers
set.seed(1)
rand_id <- sample(1:316, size = 150) %>% sort()

# subset the data 
d <- VerbAgg %>% 
    # subset to just N = 150 cases
    filter(id %in% rand_id)

fit <- brm(
    data = d,
    family = cumulative,
    resp ~ btype * situ + (1 | id),
    seed = 1, chains=2, cores=2
)

#Here is the bayesplot code, basically verbatim from the relevant help page
#https://mc-stan.org/bayesplot/reference/PPC-discrete.html
yrep_char <- posterior_predict(fit)
yrep_int <- sapply(data.frame(yrep_char, stringsAsFactors = TRUE), as.integer)
y_int <- as.integer(d$resp)

ppc_bars_grouped(
    y = y_int,
    yrep = yrep_int,
    group = interaction(d$btype, d$situ),
    freq=FALSE,
    prob = 0.5,
    fatten = 1,
    size = 1.5
)

Yielding:

This plot is replicating Solomon’s above example without conditional thresholds.

Solomon · June 28, 2024, 2:36pm

Brilliant. It never occurred to me to use the interaction() function that way, but it’s clearly a smart choice. I look forward to stealing this soon.

Topic		Replies	Views
Brms: Plotting three-way interactions with ordinal regression brms conditional-effects	5	2694	June 6, 2022
Plot interactions for family="categorical" brms	1	1060	February 12, 2020
Plots for Adjacent-category model analysis for ordinal data General brms	0	396	February 12, 2023
I cannot get the conditional_effects I am aiming for when plotting ordinal data brms interpret-results , visualisation	1	771	March 5, 2021
Question re brms: conditional_effects General brms	4	1219	July 4, 2022

Plotting pp_check interactions

Related topics