Predictions from a cumulative proportional odds model (ordinal logistic regression)

I want to fit a cumulative proportional odds model (ordinal logistic regression) with brms. In the example below, my dependent variable is a 7-level ordinal score (y) from 0 to 6. Y represents the observed outcome, and X the exposure group (x = placebo or epinephrine).

I am interested in a few outputs, such as:

  • P(Y \geq y \mid X = x)
  • P(Y \leq y \mid X= x)
  • P(Y = y \mid X = x)

I believe that, in the code below, I am able to generate P(Y = y \mid X = x) for both exposure groups and all Y (0 to 6).

Thus, I would like to confirm that my code indeed yields the P(Y = y \mid X = x) and ask for help to generate P(Y \geq y \mid X = x) and P(Y \leq y \mid X= x).

Thanks!

library(brms)
library(tidybayes)
library(dplyr)

a <- c(rep(0,15), rep(1,10), rep(2,29), rep(3,20), rep(4,8), rep(5,8), rep(6,3903))
b <- c(rep(0,12), rep(1,17), rep(2,23), rep(3,35), rep(4,12), rep(5,27), rep(6,3881))
x <- c(rep('placebo', length(a)), rep('epinephrine', length(b)))
y <- c(a, b)

d = data.frame(x, y)

d$y <-factor(d$y, ordered = T)
d$x<-factor(d$x, levels = c("placebo", "epinephrine"))

# Reduce 90% of the sample size to aid in model fitting
set.seed(123)
dd = sample_frac(d, size = 0.1, weight = x)

m_crude <- brm(
  family = cumulative("logit"),
  formula = y ~ 1 + x,
  data = dd,
  backend = "cmdstanr",
  cores = parallel::detectCores()
)

$ P(Y = y | X = x)?
m_crude |>
  epred_draws(newdata = data.frame(x = c("placebo", "epinephrine"))) |>
  group_by(x, .category) |> 
  median_hdi(.epred)


A tibble: 14 × 8
   x           .category  .epred    .lower  .upper .width .point .interval
   <chr>       <fct>       <dbl>     <dbl>   <dbl>  <dbl> <chr>  <chr>    
 1 epinephrine 0         0.00416 0.000564  0.00924   0.95 median hdi      
 2 epinephrine 1         0.00165 0.0000280 0.00504   0.95 median hdi      
 3 epinephrine 2         0.00400 0.000765  0.00910   0.95 median hdi      
 4 epinephrine 3         0.00656 0.00186   0.0130    0.95 median hdi      
 5 epinephrine 4         0.00438 0.000614  0.00982   0.95 median hdi      
 6 epinephrine 5         0.00330 0.000408  0.00811   0.95 median hdi      
 7 epinephrine 6         0.974   0.960     0.987     0.95 median hdi      
 8 placebo     0         0.00371 0.000493  0.00938   0.95 median hdi      
 9 placebo     1         0.00145 0.0000417 0.00490   0.95 median hdi      
10 placebo     2         0.00356 0.000540  0.00930   0.95 median hdi      
11 placebo     3         0.00587 0.00138   0.0134    0.95 median hdi      
12 placebo     4         0.00390 0.000357  0.00988   0.95 median hdi      
13 placebo     5         0.00294 0.000226  0.00793   0.95 median hdi      
14 placebo     6         0.976   0.957     0.991     0.95 median hdi
1 Like

You can do this using epred_draws() by grouping by the right combination of predictors and the .draw column, but I would probably do this with epred_rvars() instead because it is easier to make mistakes by grouping by the wrong thing in the long-format data table that epred_draws() gives you. epred_rvars() will give you basically the same output as epred_draws() but with a posterior::rvar column containing the epred instead of a long format data frame:

m_crude |>
  epred_rvars(newdata = data.frame(x = c("placebo", "epinephrine")), columns_to = ".category")
#> # A tibble: 14 × 4
#>    x            .row .category           .epred
#>    <chr>       <int> <chr>               <rvar>
#>  1 placebo         1 0          0.0029 ± 0.0021
#>  2 epinephrine     2 0          0.0022 ± 0.0015
#>  3 placebo         1 1          0.0035 ± 0.0023
#>  4 epinephrine     2 1          0.0027 ± 0.0018
#>  5 placebo         1 2          0.0078 ± 0.0038
#>  6 epinephrine     2 2          0.0060 ± 0.0027
#>  7 placebo         1 3          0.0123 ± 0.0050
#>  8 epinephrine     2 3          0.0096 ± 0.0036
#>  9 placebo         1 4          0.0028 ± 0.0021
#> 10 epinephrine     2 4          0.0022 ± 0.0015
#> 11 placebo         1 5          0.0115 ± 0.0049
#> 12 epinephrine     2 5          0.0090 ± 0.0034
#> 13 placebo         1 6          0.9593 ± 0.0123
#> 14 epinephrine     2 6          0.9682 ± 0.0077

This makes it easier to see the data frame structure, and thus harder to make mistakes (also rvar will enforce having draws actually line up).

Then if you want P(Y \le y | X = x) you can group by .row (which is just an id for input rows from newdata) and use cumsum() to get the cumulative probabilities, as rvar() implements cumsum():

cumprobs = m_crude |>
  epred_rvars(newdata = data.frame(x = c("placebo", "epinephrine")), columns_to = ".category") |>
  group_by(.row) |>
  mutate(.epred = cumsum(.epred))

cumprobs
#> # A tibble: 14 × 4
#> # Groups:   .row [2]
#>    x            .row .category            .epred
#>    <chr>       <int> <chr>                <rvar>
#>  1 placebo         1 0          0.0029 ± 2.1e-03
#>  2 epinephrine     2 0          0.0022 ± 1.5e-03
#>  3 placebo         1 1          0.0064 ± 3.4e-03
#>  4 epinephrine     2 1          0.0049 ± 2.5e-03
#>  5 placebo         1 2          0.0141 ± 5.7e-03
#>  6 epinephrine     2 2          0.0109 ± 3.9e-03
#>  7 placebo         1 3          0.0264 ± 8.8e-03
#>  8 epinephrine     2 3          0.0206 ± 5.7e-03
#>  9 placebo         1 4          0.0292 ± 9.5e-03
#> 10 epinephrine     2 4          0.0228 ± 6.1e-03
#> 11 placebo         1 5          0.0407 ± 1.2e-02
#> 12 epinephrine     2 5          0.0318 ± 7.7e-03
#> 13 placebo         1 6          1.0000 ± 9.6e-18
#> 14 epinephrine     2 6          1.0000 ± 8.6e-18

median_hdi() and whatnot work on rvars, so you can do this to get the output you had before:

cumprobs |>
  median_qi(.epred)
#> # A tibble: 14 × 9
#>    x            .row .category  .epred   .lower  .upper .width .point .interval
#>    <chr>       <int> <chr>       <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
#>  1 placebo         1 0         0.00232 0.000354 0.00843   0.95 median qi       
#>  2 epinephrine     2 0         0.00182 0.000305 0.00599   0.95 median qi       
#>  3 placebo         1 1         0.00568 0.00162  0.0148    0.95 median qi       
#>  4 epinephrine     2 1         0.00451 0.00135  0.0109    0.95 median qi       
#>  5 placebo         1 2         0.0132  0.00546  0.0272    0.95 median qi       
#>  6 epinephrine     2 2         0.0105  0.00483  0.0197    0.95 median qi       
#>  7 placebo         1 3         0.0254  0.0123   0.0463    0.95 median qi       
#>  8 epinephrine     2 3         0.0200  0.0111   0.0335    0.95 median qi       
#>  9 placebo         1 4         0.0282  0.0140   0.0508    0.95 median qi       
#> 10 epinephrine     2 4         0.0221  0.0125   0.0364    0.95 median qi       
#> 11 placebo         1 5         0.0395  0.0203   0.0671    0.95 median qi       
#> 12 epinephrine     2 5         0.0311  0.0187   0.0487    0.95 median qi       
#> 13 placebo         1 6         1       1        1         0.95 median qi       
#> 14 epinephrine     2 6         1       1        1         0.95 median qi
4 Likes

After reading the question, my thought was to use posterior_predict to get the raw posterior predictions and then summarize those. However, there are a number of different packages/functions for working with the posterior and I often get confused about what each one is doing. So I guess my follow-up question for @msjkay is whether the code below is correctly calculating the probabilities and cumulative probabilities from the posterior? If not, where am I going wrong? On the other hand, if I can do the same thing with posterior_predict as with the tidybayes::epred_*** functions, is the use of tidybayes functions a matter of convenience (less coding, many details taken care of, etc.)?

library(tidyverse)

# Get posterior predictions
post.pred = posterior_predict(m_crude, 
                              newdata=data.frame(x=c("placebo", "epinephrine")))

# Calculate probabilities from posterior
post.pred %>% 
  as_tibble() %>% 
  set_names(c("placebo", "epinephrine")) %>% 
  pivot_longer(everything(), names_to="x", values_to=".category") %>% 
  mutate(.category = recode(.category, !!!attr(post.pred, "levels"))) %>% 
  count(x, .category) %>% 
  group_by(x) %>% 
  mutate(prob = n/sum(n),
         prob.x.lte = cumsum(prob),
         prob.x.gte = 1 - prob.x.lte)
   x           .category     n    prob prob.x.lte prob.x.gte
   <chr>       <chr>     <int>   <dbl>      <dbl>      <dbl>
 1 epinephrine 0            16 0.004      0.004        0.996
 2 epinephrine 1             7 0.00175    0.00575      0.994
 3 epinephrine 2            29 0.00725    0.013        0.987
 4 epinephrine 3            36 0.009      0.022        0.978
 5 epinephrine 4            11 0.00275    0.0248       0.975
 6 epinephrine 5            23 0.00575    0.0305       0.970
 7 epinephrine 6          3878 0.970      1            0    
 8 placebo     0            13 0.00325    0.00325      0.997
 9 placebo     1            18 0.0045     0.00775      0.992
10 placebo     2            27 0.00675    0.0145       0.986
11 placebo     3            28 0.007      0.0215       0.978
12 placebo     4             5 0.00125    0.0228       0.977
13 placebo     5            12 0.003      0.0258       0.974
14 placebo     6          3897 0.974      1            0

@arthur-albuquerque I just wanted to mention that rep is vectorized, so you can, for example, do

a <- rep(0:6, c(15,10,29,20,8,8,3903))

instead of a bunch of separate calls to rep.

1 Like

Yes, the tidybayes functions are mostly for convenience - epred_draws/rvars calls down to posterior_epred, predicted_draws down to posterior_predict, linpred_draws down to posterior_linpred. Then they do some reshaping - e.g. the _draws functions return long format data frames similar to what you’ve constructed manually (though they can also do some useful things like pull together outputs from multiple linear predictors into one data frame which can be a pain manually). The _rvars functions are similar but give data frames with posterior::rvar columns, which can be easier to use (and take up less memory than long format dfs). In the end it mostly comes down to your preferences and comfort; I wrote those functions because I found myself doing the same pivoting operations over and over.

As for whether using posterior_predict here is what you want, that depends on if you want to quantify the uncertainty in the proportion, for which I would use posterior_epred, not posterior_predict. Your final means there should be approximately the same (modulo some Monte Carlo error) but won’t be exactly the same, and you don’t capture the uncertainty in the proportion with your method.

2 Likes

Great solution, thanks!

What about P(Y \geq y \mid X = x)? Would that be equal to 1 - cumsum(.epred)?

Close - that would give you > not \ge

1 Like

That makes sense. Can you think of any way to generate P(Y \geq y \mid X = x) from the same model?

One solution would be to invert the order of my dependent variable, re-fit the model and get the P(Y \leq y \mid X = x) with cumsum. However I would like to be able to generate P(Y = y \mid X = x), P(Y \leq y \mid X = x) and P(Y \geq y \mid X = x) with the exact same model formulation.

Once you have P(Y = y) and P(Y > y) you can sum them to get P(Y \ge y)

Of course; awesome! Here is the code generating all these estimands. Would you mind double-checking it?

Thank you very much, Matthew. Your answers are always helpful.

P.S. About my dependent variable, lower scores mean better outcomes.

m_crude |>
  epred_rvars(newdata = data.frame(x = c("placebo", "epinephrine")), columns_to = ".category") |>
  group_by(.row) |>
  mutate(equal = .epred,                        # P(Y = y  | X = x)
         equal_or_better =  cumsum(.epred),     # P(Y <= y | X = x)
         worse =  1 - cumsum(.epred),        # P(Y > y  | X = x) 
         equal_or_worse = equal + worse      # P(Y >= y | X = x) 
  ) |>
  select(-c(.epred, worse))
A tibble: 14 × 6
# Groups:   .row [2]
   x            .row .category            equal   equal_or_better equal_or_worse
   <chr>       <int> <chr>               <rvar>            <rvar>         <rvar>
 1 placebo         1 0          0.0042 ± 0.0026  0.0042 ± 2.6e-03  1.00 ± 0.0000
 2 epinephrine     2 0          0.0046 ± 0.0024  0.0046 ± 2.4e-03  1.00 ± 0.0000
 3 placebo         1 1          0.0019 ± 0.0015  0.0061 ± 3.3e-03  1.00 ± 0.0026
 4 epinephrine     2 1          0.0020 ± 0.0015  0.0066 ± 3.1e-03  1.00 ± 0.0024
 5 placebo         1 2          0.0041 ± 0.0025  0.0102 ± 4.7e-03  0.99 ± 0.0033
 6 epinephrine     2 2          0.0044 ± 0.0024  0.0111 ± 4.1e-03  0.99 ± 0.0031
 7 placebo         1 3          0.0065 ± 0.0034  0.0167 ± 6.8e-03  0.99 ± 0.0047
 8 epinephrine     2 3          0.0070 ± 0.0031  0.0181 ± 5.6e-03  0.99 ± 0.0041
 9 placebo         1 4          0.0045 ± 0.0028  0.0212 ± 8.2e-03  0.98 ± 0.0068
10 epinephrine     2 4          0.0048 ± 0.0026  0.0229 ± 6.4e-03  0.98 ± 0.0056
11 placebo         1 5          0.0034 ± 0.0023  0.0246 ± 9.3e-03  0.98 ± 0.0082
12 epinephrine     2 5          0.0037 ± 0.0022  0.0267 ± 7.1e-03  0.98 ± 0.0064
13 placebo         1 6          0.9754 ± 0.0093  1.0000 ± 3.5e-18  0.98 ± 0.0093
14 epinephrine     2 6          0.9733 ± 0.0071  1.0000 ± 3.9e-18  0.97 ± 0.0071
2 Likes

Looks good to me, glad to help!

1 Like

I’ve largely relied on the epred_draws() method before, and I’m pretty happy with it. As @mjskay remarked, you have to be careful how you group the results when summarizing the results from epred_draws(), though. However, it’s not a bad exercise to work through how one should group the summaries for a given question.

1 Like

Dear Solomon and @mjskay,

How would you perform these calculations with epred_draws instead of epred_rvars?

1 Like

The equivalent would be to group by .row and .draw. e.g.:

long_df = m_crude |>
  epred_draws(newdata = data.frame(x = c("placebo", "epinephrine"))) |>
  group_by(.row, .draw) |>
  mutate(
    equal = .epred,                     # P(Y = y  | X = x)
    equal_or_better =  cumsum(.epred),  # P(Y <= y | X = x)
    worse =  1 - cumsum(.epred),        # P(Y > y  | X = x) 
    equal_or_worse = equal + worse      # P(Y >= y | X = x) 
  ) |>
  group_by(.row, x, .category) 

long_df
#> # A tibble: 56,000 × 11
#> # Groups:   .row, x, .category [14]
#>    x        .row .chain .iteration .draw .category  .epred   equal equal_or_better worse equal_o…¹
#>    <chr>   <int>  <int>      <int> <int> <fct>       <dbl>   <dbl>           <dbl> <dbl>     <dbl>
#>  1 placebo     1     NA         NA     1 0         0.00418 0.00418         0.00418 0.996         1
#>  2 placebo     1     NA         NA     2 0         0.00492 0.00492         0.00492 0.995         1
#>  3 placebo     1     NA         NA     3 0         0.00184 0.00184         0.00184 0.998         1
#>  4 placebo     1     NA         NA     4 0         0.00113 0.00113         0.00113 0.999         1
#>  5 placebo     1     NA         NA     5 0         0.00185 0.00185         0.00185 0.998         1
#>  6 placebo     1     NA         NA     6 0         0.00337 0.00337         0.00337 0.997         1
#>  7 placebo     1     NA         NA     7 0         0.00368 0.00368         0.00368 0.996         1
#>  8 placebo     1     NA         NA     8 0         0.00105 0.00105         0.00105 0.999         1
#>  9 placebo     1     NA         NA     9 0         0.00207 0.00207         0.00207 0.998         1
#> 10 placebo     1     NA         NA    10 0         0.00500 0.00500         0.00500 0.995         1
#> # … with 55,990 more rows, and abbreviated variable name ¹​equal_or_worse
#> # ℹ Use `print(n = ...)` to see more rows

long_df |>
  median_qi(equal_or_better)
#> # A tibble: 14 × 9
#>     .row x           .category equal_or_better   .lower  .upper .width .point .interval
#>    <int> <chr>       <fct>               <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
#>  1     1 placebo     0                 0.00232 0.000354 0.00843   0.95 median qi       
#>  2     1 placebo     1                 0.00568 0.00162  0.0148    0.95 median qi       
#>  3     1 placebo     2                 0.0132  0.00546  0.0272    0.95 median qi       
#>  4     1 placebo     3                 0.0254  0.0123   0.0463    0.95 median qi       
#>  5     1 placebo     4                 0.0282  0.0140   0.0508    0.95 median qi       
#>  6     1 placebo     5                 0.0395  0.0203   0.0671    0.95 median qi       
#>  7     1 placebo     6                 1       1        1         0.95 median qi       
#>  8     2 epinephrine 0                 0.00182 0.000305 0.00599   0.95 median qi       
#>  9     2 epinephrine 1                 0.00451 0.00135  0.0109    0.95 median qi       
#> 10     2 epinephrine 2                 0.0105  0.00483  0.0197    0.95 median qi       
#> 11     2 epinephrine 3                 0.0200  0.0111   0.0335    0.95 median qi       
#> 12     2 epinephrine 4                 0.0221  0.0125   0.0364    0.95 median qi       
#> 13     2 epinephrine 5                 0.0311  0.0187   0.0487    0.95 median qi       
#> 14     2 epinephrine 6                 1       1        1         0.95 median qi

If you don’t include .draw in the first grouping you’ll accumulate values incorrectly, and if you don’t re-group by the full set of predictors afterwards you’ll summarise incorrectly with median_qi()/etc. I prefer the rvar approach because it prevents both potential mistakes. If in the end you do want a long-format data frame, you could also use the rvar approach then pass the result to tidybayes::unnest_rvars(), which will turn it into the long-format data frame.

1 Like

What is the exact role of the “.row” column? It looks redundant to the “x” column.

> m_crude |>
+   epred_draws(newdata = data.frame(x = c("placebo", "epinephrine"))) |>
+   group_by(x) |>
+   count(.row)
# A tibble: 2 × 3
# Groups:   x [2]
  x            .row     n
  <chr>       <int> <int>
1 epinephrine     2 28000
2 placebo         1 28000

Couldn’t I ignore it and just group by “x” instead of “.row”?

> m_crude |>
+   epred_draws(newdata = data.frame(x = c("placebo", "epinephrine"))) |>
+   group_by(.row, .draw) |>
+   mutate(
+     equal = .epred,                     # P(Y = y  | X = x)
+     equal_or_better =  cumsum(.epred),  # P(Y <= y | X = x)
+     worse =  1 - cumsum(.epred),        # P(Y > y  | X = x) 
+     equal_or_worse = equal + worse      # P(Y >= y | X = x) 
+   ) |>
+   group_by(.row, x, .category)  |> 
+   median_qi(equal_or_better)
# A tibble: 14 × 9
    .row x           .category equal_or_better   .lower  .upper .width .point .interval
   <int> <chr>       <fct>               <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
 1     1 placebo     0                 0.00241 0.000398 0.00817   0.95 median qi       
 2     1 placebo     1                 0.00577 0.00176  0.0145    0.95 median qi       
 3     1 placebo     2                 0.0134  0.00555  0.0274    0.95 median qi       
 4     1 placebo     3                 0.0254  0.0124   0.0471    0.95 median qi       
 5     1 placebo     4                 0.0282  0.0140   0.0516    0.95 median qi       
 6     1 placebo     5                 0.0394  0.0203   0.0680    0.95 median qi       
 7     1 placebo     6                 1       1        1         0.95 median qi       
 8     2 epinephrine 0                 0.00189 0.000325 0.00593   0.95 median qi       
 9     2 epinephrine 1                 0.00454 0.00143  0.0105    0.95 median qi       
10     2 epinephrine 2                 0.0105  0.00475  0.0195    0.95 median qi       
11     2 epinephrine 3                 0.0199  0.0109   0.0333    0.95 median qi       
12     2 epinephrine 4                 0.0220  0.0122   0.0362    0.95 median qi       
13     2 epinephrine 5                 0.0309  0.0182   0.0480    0.95 median qi       
14     2 epinephrine 6                 1       1        1         0.95 median qi 

VS.

> m_crude |>
+     epred_draws(newdata = data.frame(x = c("placebo", "epinephrine"))) |>
+     group_by(x, .draw) |> # it used to be ".row, .draw"
+     mutate(
+       equal = .epred,                     # P(Y = y  | X = x)
+       equal_or_better =  cumsum(.epred),  # P(Y <= y | X = x)
+       worse =  1 - cumsum(.epred),        # P(Y > y  | X = x) 
+       equal_or_worse = equal + worse      # P(Y >= y | X = x) 
+     ) |>
+     group_by(x, .category) |> # it used to be ".row, x, .category"
+     median_qi(equal_or_better)
# A tibble: 14 × 8
   x           .category equal_or_better   .lower  .upper .width .point .interval
   <chr>       <fct>               <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
 1 epinephrine 0                 0.00189 0.000325 0.00593   0.95 median qi       
 2 epinephrine 1                 0.00454 0.00143  0.0105    0.95 median qi       
 3 epinephrine 2                 0.0105  0.00475  0.0195    0.95 median qi       
 4 epinephrine 3                 0.0199  0.0109   0.0333    0.95 median qi       
 5 epinephrine 4                 0.0220  0.0122   0.0362    0.95 median qi       
 6 epinephrine 5                 0.0309  0.0182   0.0480    0.95 median qi       
 7 epinephrine 6                 1       1        1         0.95 median qi       
 8 placebo     0                 0.00241 0.000398 0.00817   0.95 median qi       
 9 placebo     1                 0.00577 0.00176  0.0145    0.95 median qi       
10 placebo     2                 0.0134  0.00555  0.0274    0.95 median qi       
11 placebo     3                 0.0254  0.0124   0.0471    0.95 median qi       
12 placebo     4                 0.0282  0.0140   0.0516    0.95 median qi       
13 placebo     5                 0.0394  0.0203   0.0680    0.95 median qi       
14 placebo     6                 1       1        1         0.95 median qi 
1 Like

.row is added by epred_draws() as an index of rows given to it in newdata. It’s intended e.g. to make it easier if you have multiple predictors to ensure you are grouping by all combinations of them, and to make it so that if you have duplicate combinations of predictors in newdata you can distinguish between those duplicates in the long-format df.

1 Like

Hi @mjskay,

I am trying to extend the framework above to other interesting estimands. In the example below, the dependent variable, y, is an ordinal score from 1 to 13, and there is one categorical independent variable, trt. I then fitted a proportional odds model.

library(brms)
library(dplyr)
library(tidyr)
library(tidybayes)

set.seed(123)
N = 100

y = rpois(N, 7)
trt = rbinom(N, 1, 0.5)

dd = data.frame(trt = as.factor(trt), y)

m = brms::brm(
  family = cumulative("logit"),
  formula = y ~ trt,
  data = dd,
  backend = "cmdstanr",
  cores = parallel::detectCores()
)

As in previous posts, I calculated the estimand P(Y = y | X = x) , where y ranges from 1 to 13.

> m |> 
+     tidybayes::epred_rvars(newdata = data.frame(trt = c("0", "1")),
+                            columns_to = "y")
# A tibble: 26 × 4
   trt    .row y               .epred
   <chr> <int> <chr>           <rvar>
 1 0         1 1      0.0093 ± 0.0086
 2 1         2 1      0.0130 ± 0.0117
 3 0         1 2      0.0123 ± 0.0095
 4 1         2 2      0.0171 ± 0.0131
 5 0         1 3      0.0744 ± 0.0269
 6 1         2 3      0.0999 ± 0.0329
 7 0         1 4      0.0877 ± 0.0278
 8 1         2 4      0.1115 ± 0.0349
 9 0         1 5      0.1176 ± 0.0310
10 1         2 5      0.1376 ± 0.0349
# … with 16 more rows

Now, I want to calculate the estimand P(\text{Y %in% c(1:5)} | X = x) . I tried a couple of different approaches, such as the one below, but had no success.

Is there any handy way of doing it with tidybayes/tidyverse?

# P(Y %in% c(1:5) | X = x)?
m |> 
  tidybayes::epred_rvars(newdata = data.frame(trt = c("0", "1")),
                         columns_to = "y") |> 
  
  # it doesn't work!
  dplyr::mutate(interval = dplyr::case_when(
    y %in% c(1:5) ~ "category",
    TRUE ~ "rest")
  ) |> 
  dplyr::group_by(trt, interval) |> 
  dplyr::summarise(sum = sum(.epred))
1 Like

Ah, you want sum = rvar_sum(.epred). See the table of summary functions in the rvar vignette: sum() summarizes over draws, returning an array, rvar_sum() summarizes within draws, returning an rvar.

1 Like

That is awesome. Thanks

1 Like