Does this hierarchical brms formula correctly link covariates across the multiple nested levels?

This may be a naively obvious question, but I figured I’d ask.

Is there a way to use {brms} to fit a multilevel/hierarchical model with substantial numbers of predictors at different levels in the model? (e.g. if modeling students nested inside schools, use a model with many student-level and school-level covariates)

Here’s a pseudo example using data from this conjoint choice model written in raw Stan. Respondents completed 8 separate tasks where they had to choose between four different options with randomly selected products, features, and prices. The researchers also collected indvidual-level characteristics for each respondent: their gender, the number of children they have, and whether they feel competitive.

Here’s what the data generally looks like:

(The exact setup doesn’t matter here—technically this is a multinomial model that could be estimated with a categorical family, but this question isn’t about that.)

library(tidyverse)

conjoint_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Conjoint_Cleaned_Dataset.csv")
respondent_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Demographic_Cleaned_Dataset.csv")

combined_data <- conjoint_data %>% 
  left_join(respondent_data, by = join_by(ID))
combined_data
#> # A tibble: 3,200 × 15
#>       ID Gender Children Competition   Set  Card Product_1 Product_2 Feature_1 Feature_2 Feature_3 Price Choice 
#>    <dbl>  <dbl>    <dbl>       <dbl> <dbl> <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>  <dbl> 
#>  1     1      1        1           0     1     1         1         0         1         0         0 20.0       1 
#>  2     1      1        1           0     1     2         0         1         1         0         0  7.99      0 
#>  3     1      1        1           0     1     3         0         1         0         1         0 13.0       0 
#>  4     1      1        1           0     1     4         0         0         0         0         0  0         0 
#>  5     1      1        1           0     2     1         1         0         0         1         1 20.0       1 
#>  6     1      1        1           0     2     2         1         0         0         1         0 13.0       0 
#>  7     1      1        1           0     2     3         0         1         1         0         1 20.0       0 
#>  8     1      1        1           0     2     4         0         0         0         0         0  0         0 
#>  9     1      1        1           0     3     1         0         1         0         1         1 13.0       0 
#> 10     1      1        1           0     3     2         0         1         1         0         1  7.99      0 
#> # ℹ 3,190 more rows

There’s a multilevel structure here, with some variables measured for each task (Set), and some measured for each respondent/individual (ID)

In that example repository, the author uses raw Stan to create a multilevel model, roughly like this:

\begin{aligned} \beta &\sim \operatorname{Multivariate normal} (\Gamma \times \text{Individual-level covariates}) \\ \text{Outcome} &\sim \operatorname{Multinomial logit} (\beta \times \text{Choice-level covariates}) \end{aligned}

Or in pseduo-Stan code:

for (r in 1:n_respondents) {
  Beta[,r] ~ multi_normal(Gamma*Z[,r], ...);

  for (s in 1:n_questions) {
     Y[r,s] ~ categorical_logit(X[r,s] * Beta[,r]);
  }
}

That is, each of the choice-level variables (X in the Stan code: product, feature, price) vary according across individuals and individual-level characteristics (Z in the Stan code: gender, children, competitiveness). This works by passing two separate datasets to Stan—one with choice-level covariates and one with individual-level covariates.

I’d like to run this same model (individual-level covariates informing the choice-level variables nested inside individuals) with {brms} instead of with raw Stan.

{brms} can’t handle two separate datasets, but it’s easy enough to join the individual-level and choice-level datasets into one with left_join() (as seen above)

I’m like 80% sure that a formula like this would do it, with both choice-level and individual-level covariates included in the formula, and with random individual-based slopes for each of the choice-level covariates.

Is this a correct respecification of the raw Stan approach of using individual-level data to inform choice-level predictors? Or do I need to interact each individual-level predictor with the choice-level predictors (i.e. Gender*Product_1 + Children*Product_1 + ..., etc.)? Or do I need to use {brms}'s non-linear syntax to specify separate formulas for each of the choice-level coefficients? Or am I totally off and {brms} can’t be used for this kind of hierarchical model? Or am I overthinking everything and this is actually correct?

model_attempt <- brm(
  bf(Choice ~ 
      # Choice-level predictors that are nested within individuals
      Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price +
      
      # Individual-level predictors
      Gender + Children + Competition +
      
      # Random slopes for the nested choice-level predictors
      (1 + Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price | ID)
    ),
  data = combined_data
)
1 Like

After a bunch of tinkering, I’ve found that yes, this model_attempt formula is mostly there. {brms} is fine with mixing individual- and choice-level covariates in the same formula.

To make it so that each individual-specific random slope is a function of all the individual-level predictors, you can interact the choice-level and individual-level predictors, like so:

model_full_nested <- brm(
  bf(Choice ~ 
      # Choice-level predictors that are nested within individuals...
      (Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price) *
      
      # ...interacted with individual-level predictors...
      (Gender + Children + Competition) +
      
      # ...with random individual-level slopes for the nested choice-level predictors
      (1 + Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price | ID)
  ),
  data = combined_data
)

To confirm what that formula structure is doing, I made a simpler model with lme4::lmer() and fed that object to {equatiomatic}:

library(lme4)
library(equatiomatic)

model_example <- lmer(
  Choice ~ 
    (Product_1 + Product_2 + Feature_1 + Price) *
    (Gender + Children + Competition) +
    (1 + Product_1 + Product_2 + Feature_1 + Price | ID),
  data = combined_data
)

extract_eq(model_full_nested)

The individual-level intercept (\alpha) and slopes (\beta_{1 \dots 4}) are each functions of the individual-level covariates (gender, children, and competition)

2 Likes

gorgeous

1 Like

For future forum readers, here’s a full guide/tutorial/example of using this sort of fully nested and interacted model: https://www.andrewheiss.com/blog/2023/08/12/conjoint-multilevel-multinomial-guide/

1 Like

Hi, I have similar problem related to this. I am doing conjoint analysis with long format just like you did. which family did you use to estimate in the model? categorical or leave it blank like this? (in my case, if I did not specify that it would use Gaussian. Thanks before!