MRP with ordinal outcome

I’m attempting to follow this tutorial from Monica Alexander on using brms for multilevel regression and post-stratification. The twist is that in my use case the outcome is an ordered factor with 4 levels. The tutorial deals with a proportion.

Therefore, rather than run:

mod <- brm(kept_name ~ (1|age_group) + (1|educ_group) + 
                       (1|state_name) + (1|decade_married), 
           data = d_mncs, 
           family=bernoulli())

I followed @paul.buerkner’s lead in his ordinal regression tutorial and fit the model with family = cumulative("probit").

The next step in Monica’s tutorial is to use the results of the regression to predict the outcome and multiply the .prediction by the proportion (prop). This is the post-stratification step.

res_age <- mod %>%
  add_predicted_draws(newdata=age_prop %>% 
                      filter(age_group>20, 
                             age_group<80, 
                             decade_married>1969),
                      allow_new_levels=TRUE) %>%
  rename(kept_name_predict = .prediction) %>% 
  mutate(kept_name_predict_prop = kept_name_predict*prop) %>% 
  group_by(age_group, .draw) %>% 
  summarise(kept_name_predict = sum(kept_name_predict_prop)) %>% 
  group_by(age_group) %>% 
  summarise(mean = mean(kept_name_predict), 
            lower = quantile(kept_name_predict, 0.025), 
            upper = quantile(kept_name_predict, 0.975))

This works in her example with a binary outcome (1/0) because the .prediction is numeric and can be multiplied by the population proportion. In my use case with an ordinal outcome, .prediction is a factor.

Any thoughts about how to apply this MRP example when your outcome is ordinal, not binary?

4 Likes

This is a good question. I don’t have the dataset from Monica’s example so I’ll show an example using a different model I happen to have available, the m_cyl model from the tidybayes+brms vignette.

First we’ll transform the results of add_predicted_draws() into a data frame where .prediction is always 1 (more on why in the comment below) and .category is the predicted category:

pred_df = mtcars %>% 
  select(mpg) %>% 
  # for real post-stratification these proportions would come 
  # from some other dataset, as in Monica Alexander's example
  mutate(prop = 1/n()) %>% 
  add_predicted_draws(m_cyl) %>% 
  # add_predicted_draws will give us one row per {input row} x {draw from the model}
  # containing a .prediction column that is a factor indicating the predicted category.
  # We might imagine we instead want k rows per {input row} x {draw} where k is the 
  # number of factors, .category is the factor level, and .prediction is 1 if that 
  # category was predicted and 0 otherwise. We *could* do this by creating the 1-rows
  # then "filling in" the missing 0-rows with tidyr::complete(), but we don't have to
  # do that second step: because the 0 rows are going to contribute nothing to the final 
  # sum, we can just create the 1-rows:
  mutate(
    .category = .prediction,
    .prediction = 1
  ) 
  # if you don't believe me, add the following line to the pipe to create the 
  # 0-rows that we don't need:
  # %>% complete(.row, .draw, .category, fill = list(.prediction = 0))

pred_df
# A tibble: 128,000 x 8
# Groups:   mpg, prop, .row [32]
     mpg   prop  .row .chain .iteration .draw .prediction .category
   <dbl>  <dbl> <int>  <int>      <int> <int>       <dbl> <fct>    
 1    21 0.0312     1     NA         NA     1           1 6        
 2    21 0.0312     1     NA         NA     2           1 6        
 3    21 0.0312     1     NA         NA     3           1 6        
 4    21 0.0312     1     NA         NA     4           1 6        
 5    21 0.0312     1     NA         NA     5           1 4        
 6    21 0.0312     1     NA         NA     6           1 6        
 7    21 0.0312     1     NA         NA     7           1 6        
 8    21 0.0312     1     NA         NA     8           1 4        
 9    21 0.0312     1     NA         NA     9           1 6        
10    21 0.0312     1     NA         NA    10           1 4        
# ... with 127,990 more rows

Then we can use the same summing approach from Monica’s example, except we have to
add the .category column to the grouping factors when summing:

pred_df %>%
  mutate(.prediction_prop = .prediction*prop) %>% 
  # add whatever other predictors you don't want to marginalize over to this group_by
  # (age_group in Monica Alexander's example)
  group_by(.draw, .category) %>% 
  summarise(.prediction = sum(.prediction_prop)) %>%
  group_by(.category) %>%
  mean_qi(.prediction)
# A tibble: 3 x 7
  .category .prediction .lower .upper .width .point .interval
  <fct>           <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 4               0.354 0.25    0.438   0.95 mean   qi       
2 6               0.199 0.0625  0.344   0.95 mean   qi       
3 8               0.447 0.344   0.531   0.95 mean   qi  

To double-check, since we used the proportions from the original data (instead of from another dataset, as in proper post-stratification) these should be basically the same as the proportions in the input data, modulo sampling error:

prop.table(xtabs(~ cyl, data = mtcars))
cyl
      4       6       8 
0.34375 0.21875 0.43750 

Seems fine to me.

4 Likes

Thanks for taking the extra time to explain the approach, @mjskay. Great stuff.

3 Likes

Of course!

2 Likes

For an example where we combined post-stratification with an ordinal model, you can also check out https://osf.io/preprints/socarxiv/3v5g7/

5 Likes

I’m late to the thread (buried under marking), but just wanted to add a plug for our lovely MRP reading group. Contact me or Rohan Alexander if interested!

4 Likes

Thanks again for all of the help. Here’s what I came up with. I’m a novice when it comes to MRP and brms, so if anyone has tips for improving it please shoot me a note. Hopefully there are no fatal flaws!!🤞

1 Like

Very good example!

You could add a footnote about that it might be useful to model the interaction of age group and gender as well.

1 Like

Thanks for taking the time to review and share this suggestion. I updated the post.