Discrepancy in Manually- vs Automatically-Generated pp_check() Figures

I am working with a multivariate ordered probit model fit with brms with the following form:

Modality_ord_mv_complete_fmla <- bf(mvbind(Valence, Arousal) | resp_thres(9, gr = Category) ~ Category +
                                   Condition +
                                   Category:Condition +
                                   Intensity.c +
                                   (1|o|Token) +
                                   (1|p|Condition) +
                                   (1|q|Condition:Subject)) +
                lf(disc ~ 0 + Category, resp = "Valence", cmc = FALSE) +
                lf(disc ~ 0 + Category, resp = "Arousal", cmc = FALSE)

Using the built in pp_check functions, I get reasonable looking posterior predictions:

pp_check(Modality_Ord_Complete_MV_Mod_Const_Thresh, type = "bars_grouped", group = "Token", resp = "Valence", nsamples = 10)

But, when I try to manually created this plot, or one similar I get very different results, so I’m wondering where I’m going wrong:

# Create a data frame of original data and add predictions
Modality_Preds <- cbind(Modality_Ord_Full %>%
        select(Subject, Condition, Category, Token, Pitch.c, Intensity.c, Valence) %>%
        na.omit(),
        as.data.frame(apply(posterior_predict(Modality_Ord_Complete_MV_Mod_Const_Thresh, newdata = Modality_Ord_Full %>%
                                                      select(Subject, Condition, Category, Token, Pitch.c, Intensity.c, Valence) %>%
                                                      na.omit(),
                                              re_formula = NULL,
                                              nsamples = 10),
                            c(2,3),
                            mean)) %>% # Take the mean across the sample dimension
                rename("Pred_Valence" = Valence,
                       "Pred_Arousal" = Arousal) %>%
                mutate(Pred_Valence = as.integer(floor(Pred_Valence)),
                       Pred_Arousal = as.integer(floor(Pred_Arousal)))) %>%
        select(-Pred_Arousal) %>%
        mutate(Pred_Valence = factor(Pred_Valence, levels = c(1:9), ordered = TRUE))

## Plot combined original data with predicted responses
Modality_Preds %>%
        select(Condition, Category, Token, Valence, Pred_Valence) %>% 
        ggplot() +
        facet_wrap( ~ Token, ncol = 8) +
        geom_bar(aes(x = Valence)) +
        geom_point(aes(x = Pred_Valence), stat = "count") +
        theme_bw() +
        scale_x_discrete("Rating (Valence)") +
        scale_y_continuous("Count") +
        theme(  axis.text.x = element_text(angle = 0, hjust = 0.5),
                plot.title = element_text(hjust = 0.5),
                panel.border = element_rect(fill = NA, colour = "black"),
                panel.grid.major = element_blank(),
                panel.grid.minor = element_blank(),
                axis.line = element_line(colour = "black"),
                strip.text.x = element_text(size = 12),
                strip.background = element_blank(),
                legend.position = c(1, 0), legend.justification = c(1, 0),
                legend.background = element_rect(fill = "transparent"),
                legend.title = element_blank()
        )

Could anyone help me understand how to reproduce the pp_check() plots and/or where I’m going wrong?

2 Likes

I think I’ve found at least part of the difference, if I manually work through creating counts with the expected predicted probabilities, I get much closer to the automatic plot:

med_epred <- as.data.frame(apply(posterior_epred(Modality_Ord_Complete_MV_Mod_Const_Thresh, newdata = Modality_Ord_Full %>%
                                                         select(Subject, Condition, Category, Token, Pitch.c, Intensity.c, Valence) %>%
                                                         na.omit(),
                                                 re_formula = NULL,
                                                 resp = "Valence",
                                                nsamples = 10),
                              c(2,3),
                              median)) %>%
        mutate(Subject = Modality_Ord_Full %>%
                       pull(Subject),
               Token = Modality_Ord_Full %>%
                       pull(Token)) %>%
        pivot_longer(cols = 1:9,names_to = "Rating", values_to = "Pred_Prob")

Rating_Counts <- Modality_Ord_Full %>%
        #filter(Condition == "AO") %>%
        group_by(Token) %>%
        summarize(Count = n())
        

Epred_Counts <- left_join(med_epred,
                          Rating_Counts,
                          by = "Token") %>%
        mutate(Pred_Count = Pred_Prob*Count) %>%
        group_by(Token, Rating) %>%
        summarize(Pred_Count = median(Pred_Count))

1 Like

Sorry that we didn’t react sooner - your new plot looks to me identical to the “official” pp_check, so if I understand it correctly you did resolve the issue. Or is there something you still need help with?