Anova-like summary for brm-model?

The emmeans package function joint_test can produce an anova-like summary for categorical predictors (factors), for example, the overall effect of FactorA, FactorB, and FactorA:FactorB. This can be done even for brm model (from brmsfit), which might be useful even though only to help “traditionalists” and/or for purely didactic purposes. However, it feels weird to have p-values rather than something “more Bayesian”, like HPD intervals etc. Thus, I was wondering how could you compute such a summary of the overall effects – e.g., a HPD interval for FactorA, not for its individual levels? Thanks!

Hey @striatum, would you mind showing a minimal reproducible example?

Thank you @Solomon for helping me! I am taking my old post example: How to properly compare interacting levels. The model is slightly revised, but is nice and simple leaving aside random effects, covariates etc.

# uniformly generated means
m = runif(4, 3, 18)

# random values, random means, same std. dev.
y = list()
for (i in 1:4) {
    y[[i]] = rnorm(30, m[i], 0.8)
}
y = unlist(y)

# factorial design
F1 = c(rep('A', 30), rep('B', 30))
F2 = c('I', 'J')
design_matrix = expand.grid(F1=F1, F2=F2)

# dataset
dat = cbind(design_matrix, y)

library(brms)
library(emmeans)

model1 <- brm(y ~
    F1 * F2,
    data = dat,
    chains=4, iter=4000, cores=4)

model1.emm = emmeans(model1, ~ F1 * F2)
joint_tests(model1.emm)
 # model term df1 df2  F.ratio    Chisq p.value
 # F1           1 Inf 5288.825 5288.825  <.0001
 # F2           1 Inf   18.793   18.793  <.0001
 # F1:F2        1 Inf   56.745   56.745  <.0001

I appreciate the anova-style overall effect of F1, F2, and the interaction, but credible intervals might be more in Bayesian-spirit.

Hmm. I don’t think I’m of any help, here. But yes I agree, that kind of breakdown seems very non-Bayesian. I just wouldn’t do it.

Thanks @Solomon! So nothing meaningful that would give something like HPD for the “whole” F1 or F2?

To my mind, if you care about the coefficients, just describe them with point estimates and 95% intervals or perhaps even plot them. If you care about contrasts, then just compute the contrasts of interest.

2 Likes

A follow-up, if I may, @Solomon (and with apologies if this is plain dumb): what I am asking about is the so-called omnibus test of the whole factor effect, not the differences/contrasts or the coefficients. Would you consider averages of (per)centile points to be meaningful quantities? If so, I could get an average value at 95% low/high, i.e., something like a credible interval for a given factor.

It’s been a while, but if I remember correctly, these tests are based on log likelihood ratios of different models; one that contains the focal term versus one that doesn’t. Can you clarify when you say that you would rather have an interval rather than a p-value, what would it be an interval of?

+1 for

@Solomon, can you summarise briefly why you would not do that?

I’m not a fan of NHST. Though one can do NHST as a Bayesian, to my eye Bayesian software such as brms makes it particularly convenient to switch from hypothesis testing to predictions and effect sizes. For more on this, see Figure 1 from Kruschke & Liddell (2018). In the context of that figure, I’m a bottom-row kind of guy.

2 Likes

Hi there!
I was after the same thing for a while and ultimately decided to report conditional effects instead. They also give you an idea of the overall effect and you don’t have to leave the Bayesian sphere to do so. You can get actual numbers instead of a plot from conditional_effects like this:

condeff <- conditional_effects(mdl_name, 
categorical = TRUE, #only if outcome variable is categorical, adapt as appropriate
re_formula=NULL) #means that random effects are included in the calculation

I don’t know if that’s a possible alternative in your case, but maybe it’s something to consider. Good luck either way!

In a Bayesian setting, we generally know a priori that the effects of the different factor levels are not all jointly zero (we typically use a prior that assigns only infinitesimal probability mass to this possibility). Questions like “does this factor belong in the model” are often viewed as problems of model selection, and are often judged based on a model’s predictive performance (e.g. as assessed by cross validation). If desired you can construct a fully Bayesian test of whether inclusion of a factor variable improves the model’s predictive performance based on leave-one-out cross validation (LOO-CV). brms includes functionality from the loo package to make this quite easy to do.

Another alternative, which is not available inside of brms and is rarely recommended on this forum is to marginalize over a binary indicator variable that indicates which of two models is the “correct” one. This is a way to assign nonzero prior mass to the possibility that the effects of all levels in a factor are jointly zero. But that’s usually not actually a realistic statement of prior beliefs, which I think is why you won’t see it mentioned much here. I personally find it nicer and more sensible to view the problem as a question about predictive performance and model selection, while acknowledging that it’s vanishingly unlikely that the population level differences between the factor levels are literally zero.