Uncertainty on cross-validated prediction accuracy in a brms model

Hi all,

I’m having difficulty figuring out how to get uncertainty on classification accuracy in a brms model. I have binary data with balanced classes, and I fit a k-fold-ed regression model to my data like so:

b_model <- brm(One_if_high ~ scale(uni_prob)*scale(bi_prob_smoothed),
                     data = infant_2a_gold,
                     family = bernoulli(),
                     cores = 4,
                     chains = 4,
                     warmup = 3000,
                     iter = 30000,
                     thin = 4,
                     seed = 1,
                     save_model = paste0("./CognitionModelFits/gold_2a.stan"),
                     file = paste0("./CognitionModelFits/gold_2a_fit"),
                     file_refit = "on_change",
                     control = list(max_treedepth =15), backend = "cmdstan", silent = 0
                     )

and I then use the built-in kfold tools from brms to get a dataframe of cross-validated predictions, like so:

kf_unigram_prob <- kfold(b_model, chains = 4, cores = 4, k = 10, save_fits = TRUE, seed = 1, folds = "stratified", group ="One_if_high")

kfp <- kfold_predict(kf_unigram_prob)

and process the binary draws like so:

preds <- kfp$yrep %>%
  as.data.frame() %>%
  rownames_to_column() %>%
  rename(DrawNum = rowname) %>%
  pivot_longer(cols = -DrawNum,
    names_to = "ItemClassID",
               values_to = "Predicted") %>% 
  mutate(ItemClassID = str_remove(ItemClassID,"V"))

actuals <- infant_2a_gold %>%
  rownames_to_column() %>% 
  select(List,rowname,Arpabet,One_if_high) %>% 
  rename(ItemClassID = rowname,
         GroundTruth = One_if_high)

acc <- left_join(preds, actuals)%>%
  mutate(Correct = if_else(Predicted == GroundTruth, 1,0)) %>% 
  group_by(Arpabet) %>% 
  summarise(p_correct = sum(Correct)/length(Correct)) %>% 
  mutate(BinaryPred = if_else(p_correct>0.5,1,0)) %>% 
  ungroup() %>% 
  summarise(mean_of_binary_pred = mean(BinaryPred)) 

This gives me a point estimate of 0.62 for classification accuracy, which is near-identical to the frequentist model I fit for sanity-check as I was going through this.

If I try to leverage draws to get uncertainty in classification - using code below, grouping by draws and getting the overall accuracy on each draw to form a distribution, this gives a different summary statistic - 0.59 - which represents a discrepency that is (as far as I know) not the result of run-to-run simulation error. That looks like this:

acc <- left_join(preds, actuals)%>%
  mutate(Correct = if_else(Predicted == GroundTruth, 1,0)) %>% 
  group_by(DrawNum) %>% 
  summarise(p_correct = sum(Correct)/length(Correct)) %>% 
  mean_hdi(p_correct)

What can I do to get uncertainty in the point estimate that matches the frequentist model? (I have reasons to think this is actually the right estimate, from other similar models on similar data // elsewhere in this project). Also, it’s relevant that Arpabet is an item-level variable, so I can’t group by draw and it.

Thanks so much in advance!