Difference between add_fitted_draws() and add_predicted_draws() when applied to brms binomial (Bernoulli) models

Dear Experts,

I am a Bayesian newbie. I have a question about the difference between add_fitted_draws() and add_predicted_draws() when applied to brms binomial (Bernoulli) models. Specifically, using a cut-off of 50%, I compared predictions from add_fitted_draws() with those from add_predicted_draws. I expected the 2 sets of values to be similar - but they differed. Below, I provide a minimal example.

Thank you in advance for your kind guidance and advice.

rm(list=ls())
library(rms)
library(tidyverse)
library(brms)
library(tidybayes)
library(janitor) 
 
pr_flrm = c(
  prior(normal(0, 0.857), class = "b", coef = "vs1"),
  prior(normal(0, 0.00857), class = "b", coef = "hp"))

  flrm <- brm(am ~ hp + vs, 
              data=mtcars, 
           family='bernoulli',   
           prior=pr_flrm,
           sample_prior = TRUE,  ## so that we can visualize the prior distribution
           save_pars = save_pars(all = TRUE), 
           seed=1202)

## predictions from add_fitted_draws
mtcars %>%
  dplyr::select(hp, vs) %>%
  add_fitted_draws(flrm, seed = 1234, n=1000) %>% 
  clean_names %>% 
  mutate(prediction = (value>0.50)*1) %>%   ## using a cut-off of 0.50
  group_by(row)  %>% ## each subject
  summarise(across(c(prediction), ~ list(smean.sd(.x)))) %>% 
  unnest_wider(prediction)

# A tibble: 32 x 3
     row  Mean    SD
   <int> <dbl> <dbl>
 1     1 0.342 0.475
 2     2 0.342 0.475
 3     3 0.479 0.500
 4     4 0.382 0.486
 5     5 0.073 0.260
 6     6 0.414 0.493
 7     7 0.042 0.201
 8     8 0.631 0.483
 9     9 0.468 0.499
10    10 0.328 0.470
# ... with 22 more rows


## predictions from add_predicted_draws
mtcars %>%
  dplyr::select(hp, vs) %>%
  add_predicted_draws(flrm, seed = 1234, n=1000) %>% 
  clean_names %>% 
  group_by(row)  %>% ## each subject
  summarise(across(c(prediction), ~ list(smean.sd(.x)))) %>%   ## smean.sd() from Hmsic
  unnest_wider(prediction)

# A tibble: 32 x 3
     row  Mean    SD
   <int> <dbl> <dbl>
 1     1 0.461 0.499
 2     2 0.45  0.498
 3     3 0.491 0.500
 4     4 0.467 0.499
 5     5 0.322 0.467
 6     6 0.495 0.500
 7     7 0.295 0.456
 8     8 0.528 0.499
 9     9 0.481 0.500
10    10 0.416 0.493
# ... with 22 more rows


.

1 Like

I guess I have answered my own question: If I were to use rbinom() of size 1 to simulate a Bernoulli outcome, the 2 sets of predictions are similar.

## use rbinom of size 1 to simulate a Bernoulli outcome
mtcars %>%
  dplyr::select(hp, vs) %>%
  add_fitted_draws(flrm, seed = 1234, n=1000) %>% 
  clean_names %>% 
  group_by(row) %>% 
  mutate(prediction = map(value, ~ rbinom(1, size = 1, prob = .x))) %>% 
  unnest(prediction) %>% 
  summarise(across(c(prediction), ~ list(smean.sd(.x)))) %>% 
  unnest_wider(prediction)  

# A tibble: 32 x 3
     row  Mean    SD
   <int> <dbl> <dbl>
 1     1 0.429 0.495
 2     2 0.438 0.496
 3     3 0.502 0.500
 4     4 0.461 0.499
 5     5 0.359 0.480
 6     6 0.5   0.500
 7     7 0.256 0.437
 8     8 0.56  0.497
 9     9 0.493 0.500
10    10 0.455 0.498
# ... with 22 more rows



1 Like

Great that you were able to move forward. Yes, the main distinction is that add_predicted_draws starts with the “fitted” draws but then adds the measurement noise (as you did via the rbinom)

Best of luck in further adventures with Stan!

1 Like

Thank you for taking time to read and respond to my question. Kudos to the Stan team for the software and support.