# Anova of brms model

When I do

data("mtcars")
mod1 <- lm(mpg~as.factor(cyl)+as.factor(gear)+as.factor(carb),data=mtcars)
anova(mod1)


I get F-tests for each of the indep variables. Apart from doing


mod.b1 <- brm(mpg~as.factor(cyl),data=mtcars)
mod.b0 <- brm(mpg~1,data=mtcars)
bayes_factor(mod.b1,mod.b0)


for each indep variable in the full model - Is there another procedure whereby I can determine the importance of each variable?

A general way to quantify predictor importance is by mapping the fitted model to a linear model and using relative explained variation measures for that model, such as partial R^2. This is very amenable to Bayes as you can do this for each posterior draw to get much-needed uncertainty measures for these importances. Details and a frequentist example are here.

1 Like

Another option is to use Gelmanâ€™s hierarchical ANOVA approach (Gelman, 2005; Analysis of varianceâ€”why it is more important than ever). The difficulty with that approach is often times the factor variables have a small number of categories, such as with cyl and gear, both of which only have 3 categories. Thus to help the HMC algorithm along, you might try:

• standardizing the criterion variable mpg;
• using regularizing priors, particularly for the level-2 \sigma parameters; and/or
• fiddling with the control settings, such as adapt_delta.

Hereâ€™s how that can work with your example.

# load packages
library(tidyverse)
library(brms)
library(tidybayes)

# fit the model
anova1 <- brm(
# notice we're standardizing the criterion
data = mtcars %>% mutate(mpgz = (mpg - mean(mpg)) / sd(mpg)),
family = gaussian,
# here's the basic formula for the hierarchical ANOVA
mpgz ~ 1 + (1 | cyl) + (1 | gear) + (1 | carb),
# here are our regularizing priors
prior = prior(normal(0, 1), class = Intercept) +
prior(exponential(1), class = sd) +
prior(exponential(1), class = sigma),
cores = 4, seed = 1,
# notice how this increased the adapt_delta setting
control = list(adapt_delta = .95)
)


To compare the relative importance of the factor variables, we look at the magnitudes of the level-2 \sigma posteriors. Those appear in the Group-Level Effects section of the summary() output.

summary(anova1)

 Family: gaussian
Links: mu = identity; sigma = identity
Formula: mpgz ~ 1 + (1 | cyl) + (1 | gear) + (1 | carb)
Data: mtcars %>% mutate(mpgz = (mpg - mean(mpg))/sd(mpg) (Number of observations: 32)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000

Group-Level Effects:
~carb (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.32      0.28     0.01     1.04 1.00      787     1114

~cyl (Number of levels: 3)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.90      0.53     0.12     2.24 1.00     1183      837

~gear (Number of levels: 3)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.43      0.40     0.02     1.48 1.00     1232     2046

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.04      0.54    -1.02     1.13 1.00     1758     2179

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.54      0.08     0.41     0.72 1.00     2536     2254

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


This might be easier to see in a plot.

as_draws_df(anova1) %>%
pivot_longer(starts_with("sd_")) %>%
mutate(name = str_replace(name, "sd_", "sigma[") %>% str_replace("__Intercept", "]")) %>%

ggplot(aes(x = value, y = name)) +
stat_halfeye() +
scale_y_discrete(NULL, labels = ggplot2:::parse_safe, expand = expansion(mult = 0.05)) +
xlab("posterior")


Even with all our tricks, the posteriors for \sigma_\text{carb} and \sigma_\text{gear} look a little rough, but again, thatâ€™s what often happens when you fit upper-level \sigma parameters with so few factor levels. But anyways, the cyl factor looks like itâ€™s the most important. We can even formally compare the \sigma posteriors with contrast distributions.

as_draws_df(anova1) %>%
pivot_longer(starts_with("sd_")) %>%
mutate(name = str_remove(name, "sd_") %>% str_remove("__Intercept")) %>%
mutate(name = factor(name, levels = c("carb", "gear", "cyl"))) %>%
compare_levels(value, by = name, comparison = "pairwise")  %>%

ggplot(aes(x = value, y = name)) +
stat_halfeye() +
scale_y_discrete(NULL, expand = expansion(mult = 0.05))  +
xlab(expression(sigma["[group]"]~contrasts))


Those contrast distributions reveal the uncertainty in our comparisons. Sure, cyl looks like itâ€™s the best, but with so few levels within each grouping variable, we should remain skeptical.

3 Likes

This is really cool Solomon. I would also like to see an approach to variable importance that didnâ€™t require any random effects.

1 Like

Thanks! To avoid random effects, one could always use WAIC and/or LOO-CV comparisons, too.

1 Like

Thanks. It would be nice to see an example of using those to estimate a [0,1] scaled relative importance measure.

You can get WAIC or LOO weights with the brms::model_weights() function. McElreath covered WAIC weights in Section 7.5 of the first edition of his text, and you can find my brms exploration of that method with model_weights() here.

1 Like