This may be a naively obvious question, but I figured I’d ask.
Is there a way to use {brms} to fit a multilevel/hierarchical model with substantial numbers of predictors at different levels in the model? (e.g. if modeling students nested inside schools, use a model with many student-level and school-level covariates)
Here’s a pseudo example using data from this conjoint choice model written in raw Stan. Respondents completed 8 separate tasks where they had to choose between four different options with randomly selected products, features, and prices. The researchers also collected indvidual-level characteristics for each respondent: their gender, the number of children they have, and whether they feel competitive.
Here’s what the data generally looks like:
(The exact setup doesn’t matter here—technically this is a multinomial model that could be estimated with a categorical family, but this question isn’t about that.)
library(tidyverse)
conjoint_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Conjoint_Cleaned_Dataset.csv")
respondent_data <- read_csv("https://github.com/Bartosz-G/Conjoint-Analysis-in-R/raw/main/Demographic_Cleaned_Dataset.csv")
combined_data <- conjoint_data %>%
left_join(respondent_data, by = join_by(ID))
combined_data
#> # A tibble: 3,200 × 15
#> ID Gender Children Competition Set Card Product_1 Product_2 Feature_1 Feature_2 Feature_3 Price Choice
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 1 1 1 0 1 0 0 20.0 1
#> 2 1 1 1 0 1 2 0 1 1 0 0 7.99 0
#> 3 1 1 1 0 1 3 0 1 0 1 0 13.0 0
#> 4 1 1 1 0 1 4 0 0 0 0 0 0 0
#> 5 1 1 1 0 2 1 1 0 0 1 1 20.0 1
#> 6 1 1 1 0 2 2 1 0 0 1 0 13.0 0
#> 7 1 1 1 0 2 3 0 1 1 0 1 20.0 0
#> 8 1 1 1 0 2 4 0 0 0 0 0 0 0
#> 9 1 1 1 0 3 1 0 1 0 1 1 13.0 0
#> 10 1 1 1 0 3 2 0 1 1 0 1 7.99 0
#> # ℹ 3,190 more rows
There’s a multilevel structure here, with some variables measured for each task (Set
), and some measured for each respondent/individual (ID
)
In that example repository, the author uses raw Stan to create a multilevel model, roughly like this:
Or in pseduo-Stan code:
for (r in 1:n_respondents) {
Beta[,r] ~ multi_normal(Gamma*Z[,r], ...);
for (s in 1:n_questions) {
Y[r,s] ~ categorical_logit(X[r,s] * Beta[,r]);
}
}
That is, each of the choice-level variables (X
in the Stan code: product, feature, price) vary according across individuals and individual-level characteristics (Z
in the Stan code: gender, children, competitiveness). This works by passing two separate datasets to Stan—one with choice-level covariates and one with individual-level covariates.
I’d like to run this same model (individual-level covariates informing the choice-level variables nested inside individuals) with {brms} instead of with raw Stan.
{brms} can’t handle two separate datasets, but it’s easy enough to join the individual-level and choice-level datasets into one with left_join()
(as seen above)
I’m like 80% sure that a formula like this would do it, with both choice-level and individual-level covariates included in the formula, and with random individual-based slopes for each of the choice-level covariates.
Is this a correct respecification of the raw Stan approach of using individual-level data to inform choice-level predictors? Or do I need to interact each individual-level predictor with the choice-level predictors (i.e. Gender*Product_1 + Children*Product_1 + ...
, etc.)? Or do I need to use {brms}'s non-linear syntax to specify separate formulas for each of the choice-level coefficients? Or am I totally off and {brms} can’t be used for this kind of hierarchical model? Or am I overthinking everything and this is actually correct?
model_attempt <- brm(
bf(Choice ~
# Choice-level predictors that are nested within individuals
Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price +
# Individual-level predictors
Gender + Children + Competition +
# Random slopes for the nested choice-level predictors
(1 + Product_1 + Product_2 + Feature_1 + Feature_2 + Feature_3 + Price | ID)
),
data = combined_data
)