Hi all,
I have a question about the computation of the estimated marginal means and contrasts (using the emmeans package in R) based on a fitted brms model. Consider the following toy example:
library("emmeans")
library("brms")
# Load the iris dataset
data("iris")
# Fit linear model with Petal.Width as outcome and Species as covariate
set.seed(123); fit <- brm(Petal.Width ~ Species, data = iris)
print(summary(fit, robust=TRUE), digits=10)
# Generated output:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Petal.Width ~ Species
Data: iris (Number of observations: 150)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.2459680662 0.0306554287 0.1886172595 0.3039992100 1.0024083943 2985 2505
Speciesversicolor 1.0798603337 0.0421999664 0.9994487892 1.1599985585 1.0017138236 3222 2806
Speciesvirginica 1.7805708007 0.0416863097 1.7017711722 1.8618851318 1.0010220040 3566 3226
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.2058639462 0.0121732295 0.1842404957 0.2311617068 1.0004071432 3846 3082
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# Estimated marginal means using emmeans package
emm_options(opt.digits = FALSE); options(digits=10)
emmeans(fit, specs = pairwise ~ Species)
Generated output:
$emmeans
Species emmean lower.HPD upper.HPD
setosa 0.2459680662 0.1852710892 0.3000677047
versicolor 1.3270338429 1.2649721923 1.3800841652
virginica 2.0265604092 1.9699723046 2.0822403090
Point estimate displayed: median
HPD interval probability: 0.95
$contrasts
contrast estimate lower.HPD upper.HPD
setosa - versicolor -1.0798603337 -1.1602782894 -0.9999120115
setosa - virginica -1.7805708007 -1.8588519183 -1.6993508287
versicolor - virginica -0.7011980843 -0.7789274031 -0.6202138869
Point estimate displayed: median
HPD interval probability: 0.95
It is unclear to me how the emmeans values (provided in the output above of the emmeans function) are computed. These emmeans are not exactly the same as the estimated means that are ‘manually’ computed based on the medians of the posterior distributions that are shown in the brms above output (except for the intercept). For example, for versicolor: 0.2459680662 +1.0798603337, which yields 1.3258284 (versus 1.3270338429 as provided by emmeans).
Similarly, the contrasts that are shown in the output of the emmeans function are not
exactly the same as the ‘manually’ computed values based on the emmeans (reported by the emmeans function itself). For example, for the setosa - versicolor contrast: 0.2459680662-1.3270338429, which gives -1.081065777 (versus -1.0798603337 as provided by the emmeans function).
It is not clear to me why the contrasts and the emmeans (of the emmeans function) are not ‘in line’ with each other? Can anybody clarify?
- Operating System: macOS Sonoma 14.3
- brms Version: 2.20.4
- emmeans version: 1.8.0
(by the way, If I try to reproduce the emmeans values (provided by the emmeans function) using the posterior distributions that are saved by brms, the results are also not the same. E.g., estimated mean for versicolor:
(median(posterior_samples(fit)[,1])+median(posterior_samples(fit)[,2])), which yields 1.3258284 (versus 1.3270338429 in the emmeans output)
Quite surprisingly, the contrasts values that are reported by the emmeans function can be retrieved exactly based on the posterior_samples function. For example, for the contrast setosa - versiolor:
median(posterior_samples(fit)[,1])-
(median(posterior_samples(fit)[,1])+median(posterior_samples(fit)[,2])), which yields -1.079860334 (which is the same as the value reported by emmeans)
)