How does `brms` and `emmeans` create contrasts for parameters that were never (?) sampled from?

This may be related to this older post.

My understanding is that one can use brms::hypothesis() to conduct hypothesis tests on the posterior probability of a specified test. For example, in the below code I think I’m doing the un-meaningful comparison of whether am1 is equal to the Intercept parameter. My understanding is that internally, brms would take all the draws of b_am1, then subtract them by all the draws of the Intercept parameter.

However, my question is that it seems to be possible to perform a contrast between am0 - am1, despite:

  • b_am0 not existing (as shown below)
  • Intercept not corresponding to the estimate of am0 per-se, but rather of the expected value of mpg when am = 0 and vs = 0.

If emmeans had refit the model in the background, such that every categorical variable was index coded, I would have understood–but based on how fast things are going and etc, I doubt this happened…

So then my questions are: how did emmeans manage to contrast the levels of a categorical variable when it didn’t have the draws to do so? Is there a way to get hypothesis() to do what emmeans() managed to do without refitting the model? Is there a reason why I shouldn’t be using emmeans()'s approach?

library(cmdstanr)
#> This is cmdstanr version 0.8.1.9000
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /home/mstruong/.cmdstan/cmdstan-2.36.0
#> - CmdStan version: 2.36.0
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.8). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(emmeans)
#> Welcome to emmeans.
#> Caution: You lose important information if you filter this package's results.
#> See '? untidy'
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

d <- mtcars |> 
  dplyr::mutate(vs = as.factor(vs),
                am = as.factor(am))

fit1 <- brms::brm(
  mpg ~ vs + am,
  data = d, family = gaussian(),
  backend = "cmdstanr",
  cores = 4,
  refresh = 0,
  silent = 2
)
#> Loading required package: rstan
#> Loading required package: StanHeaders
#> 
#> rstan version 2.32.7 (Stan version 2.32.2)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)

emmeans(fit1, ~ am) |> 
  pairs() |> 
  summary(point.est = mean)
#>  contrast  estimate lower.HPD upper.HPD
#>  am0 - am1    -6.08     -8.49     -3.22
#> 
#> Results are averaged over the levels of: vs 
#> Point estimate displayed: mean 
#> HPD interval probability: 0.95

posterior::summarize_draws(fit1) |> 
  dplyr::filter(variable == "b_am1")
#> # A tibble: 1 × 10
#>   variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_am1     6.08   6.08  1.33  1.31  3.92  8.22  1.00    4366.    2693.

hypothesis(fit1, "am1 = Intercept")
#> Hypothesis Tests for class b:
#>              Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
#> 1 (am1)-(Intercept) = 0    -8.51      1.94   -12.35    -4.72         NA
#>   Post.Prob Star
#> 1        NA    *
#> ---
#> 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
#> '*': For one-sided hypotheses, the posterior probability exceeds 95%;
#> for two-sided hypotheses, the value tested against lies outside the 95%-CI.
#> Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(fit1, "am1 = am0")
#> Error: Some parameters cannot be found in the model: 
#> 'b_am0'

Created on 2025-06-19 with reprex v2.1.1

  • Operating System: Ubuntu 22.04
  • brms Version: 2.22.8

The model predicts mpg for four possible cases:

  • am=0, vs=0 (This is the intercept)
  • am=1, vs=0
  • am=0, vs=1
  • am=1, vs=1

But this requires only two parameters plus the intercept.

The model is:

mpg = b_0 + b_{am}*am + b_{vs}*vs

When am=0, this reduces to:

mpg = b_0 + b_{vs}*vs

So, when vs=0, the difference (or contrast) between predicted mpg when am=0 and am=1 is:

\Delta mpg_{am=0}^{am=1} = (b_0 + b_{am}*0) - (b_0 + b_{am}*1) = -b_{am}

You can do the same calculations for the case of vs=1, but since there’s no interaction, you’ll get the same contrast for am.

The coefficient b_{am} in my example above is called am1 in the fit1 model (where the reference level is implicitly am=0).

You can get draws from the posterior for all four of the possible model outcomes. I think emmeans is just summarizing these posterior draws for the cases where am=1 and am=0, and averaging this difference over both levels of vs. (If you fit the model where am and vs interact, mpg ~ am * vs, you’ll find that emmeans returns 6.19 for am0 - am1, when averaged over both levels of vs. But if you run emmeans separately for each level of vs (emmeans(fit2, ~ am, by="vs")), you’ll see that am0 - am1 is different for each level of vs.)

You can calculate the contrast directly by summarizing the individual draws for the fitted values. Note in the code below that we first extract posterior draws for the four possible combinations of vs and am. Then, we calculate the differences of the individual draws and then take the mean (mean of the differences) rather than taking the means first and then calculating the difference (difference of the means).

library(tidyverse)
library(tidybayes)

# Get all draws of the fitted values for all combinations of am and vs
epred.draws = tidybayes::epred_draws(fit1, newdata=tidyr::crossing(am=c(0,1), vs=c(0,1))) %>% 
  print()
> # A tibble: 16,000 × 7
> # Groups:   am, vs, .row [4]
>       am    vs  .row .chain .iteration .draw .epred
>  1     0     0     1     NA         NA     1   14.1
>  2     0     0     1     NA         NA     2   14.9
>  3     0     0     1     NA         NA     3   14.5
>  4     0     0     1     NA         NA     4   13.9
>  5     0     0     1     NA         NA     5   14.1
>  6     0     0     1     NA         NA     6   14.3
>  7     0     0     1     NA         NA     7   16.5
>  8     0     0     1     NA         NA     8   16.3
>  9     0     0     1     NA         NA     9   13.9
> 10     0     0     1     NA         NA    10   13.8
> # ℹ 15,990 more rows
> # ℹ Use `print(n = ...)` to see more rows

epred.draws %>% ungroup %>% count(am, vs)
>      am    vs     n
> 1     0     0  4000
> 2     0     1  4000
> 3     1     0  4000
> 4     1     1  4000
# Summarize draws to reproduce emmeans results
epred.draws %>% 
  group_by(.draw, vs) %>% 
  # Get difference between .epred when am=0 and am=1
  summarise(`am0 - am1` = -diff(.epred)) %>% 
  ungroup() %>% 
  summarise(ndraws = n_distinct(.draw),
            tidybayes::mean_hdi(`am0 - am1`))
>   ndraws     y  ymin  ymax .width .point .interval
> 1   4000 -6.08 -8.69 -3.53   0.95 mean   hdi   

emmeans(fit1, ~ am) |> 
  pairs() |> 
  summary(point.est = mean)
>  contrast  estimate lower.HPD upper.HPD
>  am0 - am1    -6.08      -8.7     -3.56

As you noted hypothesis(fit1, "am1 = Intercept") is giving you the difference between am1 and the Intercept, which isn’t meaningful. The code below does what (I think) hypothesis is doing.

tidybayes::tidy_draws(fit1) %>% print() %>% 
  select(b_Intercept:b_am1) %>% 
  mutate(diff_am1_intercept = b_am1 - b_Intercept) %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  summarise(value=tidybayes::mean_hdi(value))
>   name               value$y  $ymin $ymax $.width $.point $.interval
> 1 b_Intercept          14.6   12.7  16.5     0.95 mean    hdi       
> 2 b_am1                 6.08   3.53  8.69    0.95 mean    hdi       
> 3 b_vs1                 6.93   4.36  9.36    0.95 mean    hdi       
> 4 diff_am1_intercept   -8.50 -12.3  -4.66    0.95 mean    hdi  

hypothesis(fit1, "am1 = Intercept") 
>              Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
> 1 (am1)-(Intercept) = 0     -8.5      1.93   -12.34    -4.71         NA        NA    *

Hi @joels,

Thank you for your in-depth response and sorry for my delayed response. I am genuinely impressed by your use of the diff() function within summarise()– I had no idea that was something you could do with tidybayes and get something meaningful.

I have a follow-up related question–do you know if this method of using epred_draws() to compute the contrast is better or worse than the index variable approach that McElreath teaches in his Statistical Rethinking textbook? Here’s Kurz’s brms translation of what McElreath teaches for context

It seems like using epred_draws() does tell me how emmeans() achieves its contrast, although I’m still confused about whether using emmeans() this way is universally okay. I’m worried that there are some cases where not using the index variable approach would hide some insidious problem that HMC might warn us about.

Michael

Whether you use indexing or dummy variables for your model, either way epred_draws() is just extracting draws from the posterior, which you can summarize as desired, so I think either approach would work. Also, I haven’t used the indexing approach before, but shouldn’t the actual predictions (in this case, the change in mpg for a given change in am or vs) be the same either way?

Regarding the diff() function: Since in this case there are only two values in each group, you could also do am0 - am1 = .epred[1] - .epred[2].

Hi @joels ,

Also, I haven’t used the indexing approach before, but shouldn’t the actual predictions (in this case, the change in mpg for a given change in am or vs) be the same either way?

Well, based on Section 5.3.1 of McElreath’s Rethinking 2E, I don’t think that’s necessarily guaranteed because you have to use a different prior for indexing vs dummy-variable approach, right? My understanding is that with the index approach, we put the prior on the expected value of the response variable for each level of the factor. With the dummy-variable approach, we’re putting a prior on both the difference between the levels and also a prior on what the baseline level is. This will make the predictions (or perhaps more appropriately the parameter-estimate?) for non-baseline levels more uncertain. (And regardless of precision, I guess the difference in priors could entail a difference in the estimate.)

Now, as MeElreath writes, for ‘typical’ applications, I would assume that the differences in what prior you use shouldn’t matter too much because it’ll wash-out with the amount of data you have.

However, when you’re doing something like a meta-analytical meta-regression, where the number of studies in a particular level of a factor is small, then I guess the uncertainty (added for purely non-scientific computational reasons) using the dummy-variable approach could be impactful. This might be especially true for social-psych meta-analyses where the literature-proposed default priors are all fair weak IMO. Additionally, IME, it’s been some mix of quite complicated to impossible to get brms to do the index variable approach when you have multiple categorical variables. Therefore, unless one is willing to deal with Stan code directly, we are forced to accept the additional uncertainty via the dummy-variable approach when using brms for meta-analyses.

Given that the dummy-variable approach is basically thrust upon me, I brought up the issue RE: HMC throwing convergence/fitting warnings because I was openly wondering whether the dummy-variable approach might hide scientifically-meaningful convergence warnings that you’d otherwise get with the index variable approach with multiple different categorical variables…

Well, all that being said, I do think that your initial post has answered the opening question, so I’ll mark it as the solution.

Riffing on one of your sub-points: I found McElreath’s contrast of index versus dummy coding instructive, and I hadn’t seen another source discussing their consequences for posterior uncertainty. However, I usually prefer the dummy approach in my real work. I often find it easier to think of a prior for the contrast than for the unconditional means. There’s also the issue that my brms non-linear syntax solution to McElreath’s index code is, let’s face it, the stuff of nightmares. But if you like the index approach, rock on.