Product of two response vars: avg_predictions(), generated quantities, or post-processing of draws()?

dholstius · May 2, 2024, 5:52pm

I have a multivariate model with two outcomes, Y_1 and Y_2:

E[Y_1] is an average rate (proportion)
E[Y_2] is an average amount

The goal requires E[Y_1] and E[Y_2] conditional on certain combinations of covariates, and averaged across others, but also E[Y_1 Y_2] conditional on those same combinations.

My understanding is that marginaleffects::avg_predictions() can compute E[Y_1|\dots] and E[Y_2|\dots], but I don’t see how it can deliver E[Y_1Y_2|\dots]. (If it can, great!) Y_1 and Y_2 are both positive, and positively correlated, so the product E[Y_1|\dots] E[Y_2|\dots] will be biased low compared to E[Y_1 Y_2|\dots].

Wondering which of these approaches might seem best, or if there is a fourth option:

Get avg_predictions() to yield the desired product
Modify the generated quantities block using brm(..., stanvars = ...) (see below)
Sample from the paired draws() and calculate, then summarize, the products from the paired draws

Here is an example using a toy dataset and a bit of pseudo-ish code inside stanvars() that doesn’t work, but hopefully gives the idea for approach #2:

require(brms)
require(cmdstanr)

bf_mpg <- bf(mpg ~ cyl)
bf_am <- bf(am ~ disp) + bernoulli()

mod <- brm(
  bf_mpg + bf_am, 
  stanvars = stanvar(
    scode = "real foo = mpg .* am;", # What should this be?
    block = "genquant")
  backend = "cmdstanr", 
  data = mtcars)

Solutions or ideas? Thank you in advance for your time and attention!

desislava · May 19, 2024, 5:54pm

Another option (a variation on #3) is to use brms::posterior_epred or tidybayes::add_epred_draws, with a manual step to compute the product Y_1Y_2 of the two outcomes Y_1 and Y_2.

First create a data frame with the covariate combinations of interest.

newdata <- data.frame(
  cyl = c(4, 4, 6, 6),
  disp = c(200, 300, 200, 300)
)

3a. Using brms::posterior_epred

epreds <- posterior_epred(mod, newdata = newdata)
dim(epreds)
# (num draws) x (num newdata rows) x (num responses)
#> [1] 4000    4    2

Y1 <- epreds[, , 1] # corresponds to the `mpg` outcome
Y2 <- epreds[, , 2] # corresponds to the `am` outcome

# Compute the product outcome.
Y1xY2 <- Y1 * Y2

3b. Using tidybayes::add_epred_draws

epreds <- add_epred_draws(newdata, mod)
epreds
# The `.category` column indicates the response.
#> # Groups:   cyl, disp, .row, .category [8]
#>      cyl  disp  .row .chain .iteration .draw .category .epred
#>    <dbl> <dbl> <int>  <int>      <int> <int> <fct>      <dbl>
#>  1     4   200     1     NA         NA     1 mpg         27.1
#>  2     4   200     1     NA         NA     2 mpg         26.9
#>  3     4   200     1     NA         NA     3 mpg         26.1
#>  4     4   200     1     NA         NA     4 mpg         25.9
#>  5     4   200     1     NA         NA     5 mpg         26.6
#>  6     4   200     1     NA         NA     6 mpg         26.9
#>  7     4   200     1     NA         NA     7 mpg         26.7
#>  8     4   200     1     NA         NA     8 mpg         26.3
#>  9     4   200     1     NA         NA     9 mpg         26.9
#> 10     4   200     1     NA         NA    10 mpg         27.1
#> # ℹ 31,990 more rows

# Pivot the `.category` column into separate `mpg` and `am` columns
# and calculate their product.
epreds <- epreds |>
  pivot_wider(
    names_from = .category,
    values_from = .epred
  ) |>
  mutate(
    mpg_am = mpg * am
  )

Now we’ve got two data structures (an array Y1xY2 and a tibble epreds) with the posterior draws of the expected product Y_1Y_2. We can use any of the functions that facilitate summarizing or plotting the posterior predictive distribution.

# The summary has one row for each covariate combination (row) in newdata.
# The `variable` column though is not particularly informative.
summarise_draws(Y1xY2)
#> # A tibble: 4 × 10
#>   variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 ...1     11.2   11.2   3.05  3.16 6.35  16.4   1.00    3957.    2869.
#> 2 ...2      3.92   3.52  2.37  2.25 0.806  8.40  1.00    3321.    1799.
#> 3 ...3      8.79   8.79  2.38  2.50 4.97  12.8   1.00    3959.    2935.
#> 4 ...4      3.06   2.74  1.85  1.77 0.627  6.55  1.00    3296.    1860.

# It may more convenient to have a column for each of the covariates as well.
epreds |> median_qi(mpg_am, .width = 0.9)
#> # A tibble: 4 × 9
#>     cyl  disp  .row mpg_am .lower .upper .width .point .interval
#>   <dbl> <dbl> <int>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1     4   200     1  11.2   6.35   16.4     0.9 median qi       
#> 2     4   300     2   3.52  0.806   8.40    0.9 median qi       
#> 3     6   200     3   8.79  4.97   12.8     0.9 median qi       
#> 4     6   300     4   2.74  0.627   6.55    0.9 median qi

Topic		Replies	Views
Multivariate model specification Interfaces brms	6	922	May 18, 2021
Extracting predicted nonlinear random effects after covariates brms	3	916	September 21, 2020
Question re brms: conditional_effects General brms	4	1204	July 4, 2022
New blog post: using Stan/brms to get a covariate-adjusted average treatment effect with uncertainty Publicity	1	690	September 18, 2020
AME for predicted instead of fitted - brmsmargins Modeling techniques , brms	1	610	May 18, 2022

Product of two response vars: avg_predictions(), generated quantities, or post-processing of draws()?

Related topics