Anova of brms model

JJ_Moe · October 28, 2023, 3:14pm

When I do

data("mtcars")
mod1 <- lm(mpg~as.factor(cyl)+as.factor(gear)+as.factor(carb),data=mtcars)
anova(mod1)

I get F-tests for each of the indep variables. Apart from doing


mod.b1 <- brm(mpg~as.factor(cyl),data=mtcars)
mod.b0 <- brm(mpg~1,data=mtcars)
bayes_factor(mod.b1,mod.b0)

for each indep variable in the full model - Is there another procedure whereby I can determine the importance of each variable?

harrelfe · October 29, 2023, 1:06pm

A general way to quantify predictor importance is by mapping the fitted model to a linear model and using relative explained variation measures for that model, such as partial R^2. This is very amenable to Bayes as you can do this for each posterior draw to get much-needed uncertainty measures for these importances. Details and a frequentist example are here.

Solomon · October 30, 2023, 3:29pm

Another option is to use Gelman’s hierarchical ANOVA approach (Gelman, 2005; Analysis of variance—why it is more important than ever). The difficulty with that approach is often times the factor variables have a small number of categories, such as with cyl and gear, both of which only have 3 categories. Thus to help the HMC algorithm along, you might try:

standardizing the criterion variable mpg;
using regularizing priors, particularly for the level-2 \sigma parameters; and/or
fiddling with the control settings, such as adapt_delta.

Here’s how that can work with your example.

# load packages
library(tidyverse)
library(brms)
library(tidybayes)

# fit the model
anova1 <- brm(
    # notice we're standardizing the criterion
    data = mtcars %>% mutate(mpgz = (mpg - mean(mpg)) / sd(mpg)),
    family = gaussian,
    # here's the basic formula for the hierarchical ANOVA
    mpgz ~ 1 + (1 | cyl) + (1 | gear) + (1 | carb),
    # here are our regularizing priors
    prior = prior(normal(0, 1), class = Intercept) +
        prior(exponential(1), class = sd) +
        prior(exponential(1), class = sigma),
    cores = 4, seed = 1,
    # notice how this increased the adapt_delta setting
    control = list(adapt_delta = .95)
)

To compare the relative importance of the factor variables, we look at the magnitudes of the level-2 \sigma posteriors. Those appear in the Group-Level Effects section of the summary() output.

summary(anova1)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpgz ~ 1 + (1 | cyl) + (1 | gear) + (1 | carb) 
   Data: mtcars %>% mutate(mpgz = (mpg - mean(mpg))/sd(mpg) (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~carb (Number of levels: 6) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.32      0.28     0.01     1.04 1.00      787     1114

~cyl (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.90      0.53     0.12     2.24 1.00     1183      837

~gear (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.43      0.40     0.02     1.48 1.00     1232     2046

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.04      0.54    -1.02     1.13 1.00     1758     2179

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.54      0.08     0.41     0.72 1.00     2536     2254

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

This might be easier to see in a plot.

as_draws_df(anova1) %>% 
    pivot_longer(starts_with("sd_")) %>% 
    mutate(name = str_replace(name, "sd_", "sigma[") %>% str_replace("__Intercept", "]")) %>% 
    
    ggplot(aes(x = value, y = name)) +
    stat_halfeye() +
    scale_y_discrete(NULL, labels = ggplot2:::parse_safe, expand = expansion(mult = 0.05)) +
    xlab("posterior")

Even with all our tricks, the posteriors for \sigma_\text{carb} and \sigma_\text{gear} look a little rough, but again, that’s what often happens when you fit upper-level \sigma parameters with so few factor levels. But anyways, the cyl factor looks like it’s the most important. We can even formally compare the \sigma posteriors with contrast distributions.

as_draws_df(anova1) %>% 
    pivot_longer(starts_with("sd_")) %>% 
    mutate(name = str_remove(name, "sd_") %>% str_remove("__Intercept")) %>% 
    mutate(name = factor(name, levels = c("carb", "gear", "cyl"))) %>%
    compare_levels(value, by = name, comparison = "pairwise")  %>% 
    
    ggplot(aes(x = value, y = name)) +
    stat_halfeye() +
    scale_y_discrete(NULL, expand = expansion(mult = 0.05))  +
    xlab(expression(sigma["[group]"]~contrasts))

Those contrast distributions reveal the uncertainty in our comparisons. Sure, cyl looks like it’s the best, but with so few levels within each grouping variable, we should remain skeptical.

harrelfe · October 30, 2023, 7:32pm

This is really cool Solomon. I would also like to see an approach to variable importance that didn’t require any random effects.

Solomon · October 30, 2023, 7:34pm

Thanks! To avoid random effects, one could always use WAIC and/or LOO-CV comparisons, too.

harrelfe · November 1, 2023, 2:22pm

Thanks. It would be nice to see an example of using those to estimate a [0,1] scaled relative importance measure.

Solomon · November 1, 2023, 2:41pm

You can get WAIC or LOO weights with the brms::model_weights() function. McElreath covered WAIC weights in Section 7.5 of the first edition of his text, and you can find my brms exploration of that method with model_weights() here.

Topic		Replies	Views
Model Selection in BRMS brms techniques , performance , loo , brms	17	2917	September 13, 2023
Any implementation of "Bayesian" dominance analysis (like relative importance) so far? General rstan , brms	1	345	September 14, 2022
Observe importance of different features in making predictions Modeling	6	534	September 14, 2023
Categorical interactions - post-hoc tests Modeling brms	2	644	October 1, 2022
Compare influence of covariates in model brms techniques , brms	25	1982	March 19, 2021

Anova of brms model

Related topics