Does this hierarchical brms formula correctly link covariates across the multiple nested levels?

This may be a naively obvious question, but I figured I’d ask.

Is there a way to use {brms} to fit a multilevel/hierarchical model with substantial numbers of predictors at different levels in the model? (e.g. if modeling students nested inside schools, use a model with many student-level and school-level covariates)

Here’s a pseudo example using data from this conjoint choice model written in raw Stan. Respondents completed 8 separate tasks where they had to choose between four different options with randomly selected products, features, and prices. The researchers also collected indvidual-level characteristics for each respondent: their gender, the number of children they have, and whether they feel competitive.

Here’s what the data generally looks like:

(The exact setup doesn’t matter here—technically this is a multinomial model that could be estimated with a categorical family, but this question isn’t about that.)

library(tidyverse)

conjoint_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Conjoint_Cleaned_Dataset.csv")
respondent_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Demographic_Cleaned_Dataset.csv")

combined_data <- conjoint_data %>% 
  left_join(respondent_data, by = join_by(ID))
combined_data
#> # A tibble: 3,200 × 15
#>       ID Gender Children Competition   Set  Card Product_1 Product_2 Feature_1 Feature_2 Feature_3 Price Choice 
#>    <dbl>  <dbl>    <dbl>       <dbl> <dbl> <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>  <dbl> 
#>  1     1      1        1           0     1     1         1         0         1         0         0 20.0       1 
#>  2     1      1        1           0     1     2         0         1         1         0         0  7.99      0 
#>  3     1      1        1           0     1     3         0         1         0         1         0 13.0       0 
#>  4     1      1        1           0     1     4         0         0         0         0         0  0         0 
#>  5     1      1        1           0     2     1         1         0         0         1         1 20.0       1 
#>  6     1      1        1           0     2     2         1         0         0         1         0 13.0       0 
#>  7     1      1        1           0     2     3         0         1         1         0         1 20.0       0 
#>  8     1      1        1           0     2     4         0         0         0         0         0  0         0 
#>  9     1      1        1           0     3     1         0         1         0         1         1 13.0       0 
#> 10     1      1        1           0     3     2         0         1         1         0         1  7.99      0 
#> # ℹ 3,190 more rows

There’s a multilevel structure here, with some variables measured for each task (Set), and some measured for each respondent/individual (ID)

In that example repository, the author uses raw Stan to create a multilevel model, roughly like this:

\begin{aligned} \beta &\sim \operatorname{Multivariate normal} (\Gamma \times \text{Individual-level covariates}) \\ \text{Outcome} &\sim \operatorname{Multinomial logit} (\beta \times \text{Choice-level covariates}) \end{aligned}

Or in pseduo-Stan code:

for (r in 1:n_respondents) {
  Beta[,r] ~ multi_normal(Gamma*Z[,r], ...);

  for (s in 1:n_questions) {
     Y[r,s] ~ categorical_logit(X[r,s] * Beta[,r]);
  }
}

That is, each of the choice-level variables (X in the Stan code: product, feature, price) vary according across individuals and individual-level characteristics (Z in the Stan code: gender, children, competitiveness). This works by passing two separate datasets to Stan—one with choice-level covariates and one with individual-level covariates.

I’d like to run this same model (individual-level covariates informing the choice-level variables nested inside individuals) with {brms} instead of with raw Stan.

{brms} can’t handle two separate datasets, but it’s easy enough to join the individual-level and choice-level datasets into one with left_join() (as seen above)

I’m like 80% sure that a formula like this would do it, with both choice-level and individual-level covariates included in the formula, and with random individual-based slopes for each of the choice-level covariates.

Is this a correct respecification of the raw Stan approach of using individual-level data to inform choice-level predictors? Or do I need to interact each individual-level predictor with the choice-level predictors (i.e. Gender*Product_1 + Children*Product_1 + ..., etc.)? Or do I need to use {brms}'s non-linear syntax to specify separate formulas for each of the choice-level coefficients? Or am I totally off and {brms} can’t be used for this kind of hierarchical model? Or am I overthinking everything and this is actually correct?

model_attempt <- brm(
  bf(Choice ~ 
      # Choice-level predictors that are nested within individuals
      Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price +
      
      # Individual-level predictors
      Gender + Children + Competition +
      
      # Random slopes for the nested choice-level predictors
      (1 + Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price | ID)
    ),
  data = combined_data
)
1 Like

After a bunch of tinkering, I’ve found that yes, this model_attempt formula is mostly there. {brms} is fine with mixing individual- and choice-level covariates in the same formula.

To make it so that each individual-specific random slope is a function of all the individual-level predictors, you can interact the choice-level and individual-level predictors, like so:

model_full_nested <- brm(
  bf(Choice ~ 
      # Choice-level predictors that are nested within individuals...
      (Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price) *
      
      # ...interacted with individual-level predictors...
      (Gender + Children + Competition) +
      
      # ...with random individual-level slopes for the nested choice-level predictors
      (1 + Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price | ID)
  ),
  data = combined_data
)

To confirm what that formula structure is doing, I made a simpler model with lme4::lmer() and fed that object to {equatiomatic}:

library(lme4)
library(equatiomatic)

model_example <- lmer(
  Choice ~ 
    (Product_1 + Product_2 + Feature_1 + Price) *
    (Gender + Children + Competition) +
    (1 + Product_1 + Product_2 + Feature_1 + Price | ID),
  data = combined_data
)

extract_eq(model_full_nested)

The individual-level intercept (\alpha) and slopes (\beta_{1 \dots 4}) are each functions of the individual-level covariates (gender, children, and competition)

2 Likes

gorgeous

1 Like

For future forum readers, here’s a full guide/tutorial/example of using this sort of fully nested and interacted model: https://www.andrewheiss.com/blog/2023/08/12/conjoint-multilevel-multinomial-guide/

1 Like