This is a good question. I don’t have the dataset from Monica’s example so I’ll show an example using a different model I happen to have available, the m_cyl
model from the tidybayes+brms vignette.
First we’ll transform the results of add_predicted_draws()
into a data frame where .prediction
is always 1
(more on why in the comment below) and .category
is the predicted category:
pred_df = mtcars %>%
select(mpg) %>%
# for real post-stratification these proportions would come
# from some other dataset, as in Monica Alexander's example
mutate(prop = 1/n()) %>%
add_predicted_draws(m_cyl) %>%
# add_predicted_draws will give us one row per {input row} x {draw from the model}
# containing a .prediction column that is a factor indicating the predicted category.
# We might imagine we instead want k rows per {input row} x {draw} where k is the
# number of factors, .category is the factor level, and .prediction is 1 if that
# category was predicted and 0 otherwise. We *could* do this by creating the 1-rows
# then "filling in" the missing 0-rows with tidyr::complete(), but we don't have to
# do that second step: because the 0 rows are going to contribute nothing to the final
# sum, we can just create the 1-rows:
mutate(
.category = .prediction,
.prediction = 1
)
# if you don't believe me, add the following line to the pipe to create the
# 0-rows that we don't need:
# %>% complete(.row, .draw, .category, fill = list(.prediction = 0))
pred_df
# A tibble: 128,000 x 8
# Groups: mpg, prop, .row [32]
mpg prop .row .chain .iteration .draw .prediction .category
<dbl> <dbl> <int> <int> <int> <int> <dbl> <fct>
1 21 0.0312 1 NA NA 1 1 6
2 21 0.0312 1 NA NA 2 1 6
3 21 0.0312 1 NA NA 3 1 6
4 21 0.0312 1 NA NA 4 1 6
5 21 0.0312 1 NA NA 5 1 4
6 21 0.0312 1 NA NA 6 1 6
7 21 0.0312 1 NA NA 7 1 6
8 21 0.0312 1 NA NA 8 1 4
9 21 0.0312 1 NA NA 9 1 6
10 21 0.0312 1 NA NA 10 1 4
# ... with 127,990 more rows
Then we can use the same summing approach from Monica’s example, except we have to
add the .category
column to the grouping factors when summing:
pred_df %>%
mutate(.prediction_prop = .prediction*prop) %>%
# add whatever other predictors you don't want to marginalize over to this group_by
# (age_group in Monica Alexander's example)
group_by(.draw, .category) %>%
summarise(.prediction = sum(.prediction_prop)) %>%
group_by(.category) %>%
mean_qi(.prediction)
# A tibble: 3 x 7
.category .prediction .lower .upper .width .point .interval
<fct> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 4 0.354 0.25 0.438 0.95 mean qi
2 6 0.199 0.0625 0.344 0.95 mean qi
3 8 0.447 0.344 0.531 0.95 mean qi
To double-check, since we used the proportions from the original data (instead of from another dataset, as in proper post-stratification) these should be basically the same as the proportions in the input data, modulo sampling error:
prop.table(xtabs(~ cyl, data = mtcars))
cyl
4 6 8
0.34375 0.21875 0.43750
Seems fine to me.