Understanding brms nested grouping term in multivariate response models

I’m fitting a multivariate logit response model with correlated varying intercepts using brms, following this vignette. The application is modeling multiple survey responses per respondent. I am trying to understand what the exact model specification is when using the nested notation (1 | id | group).

A simplified version of the model I’m fitting looks like this (full reproducible example below):

form <- bf(mvbind(pres, gov) ~ (1 | id | county_fips))
out <- brm(form, 
           data = dat, 
           family = bernoulli)  

Both outcomes (pres and gov) are binary and I want to include random intercepts at the county level that are correlated across outcomes.

I can’t find documentation about exactly what model (1 | id | county_fips) denotes. The generated Stan code from the model above looks like this:

// generated with brms 2.18.0
functions {
  /* compute correlated group-level effects
   * Args:
   *   z: matrix of unscaled group-level effects
   *   SD: vector of standard deviation parameters
   *   L: cholesky factor correlation matrix
   * Returns:
   *   matrix of scaled group-level effects
   */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
}
data {
  int<lower=1> N; // total number of observations
  int<lower=1> N_pres; // number of observations
  array[N_pres] int Y_pres; // response variable
  int<lower=1> N_gov; // number of observations
  array[N_gov] int Y_gov; // response variable
  // data for group-level effects of ID 1
  int<lower=1> N_1; // number of grouping levels
  int<lower=1> M_1; // number of coefficients per level
  array[N_pres] int<lower=1> J_1_pres; // grouping indicator per observation
  array[N_gov] int<lower=1> J_1_gov; // grouping indicator per observation
  // group-level predictor values
  vector[N_pres] Z_1_pres_1;
  vector[N_gov] Z_1_gov_2;
  int<lower=1> NC_1; // number of group-level correlations
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  
}
parameters {
  real Intercept_pres; // temporary intercept for centered predictors
  real Intercept_gov; // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1; // group-level standard deviations
  matrix[M_1, N_1] z_1; // standardized group-level effects
  cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1; // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_pres_1;
  vector[N_1] r_1_gov_2;
  real lprior = 0; // prior contributions to the log posterior
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_pres_1 = r_1[ : , 1];
  r_1_gov_2 = r_1[ : , 2];
  lprior += student_t_lpdf(Intercept_pres | 3, 0, 2.5);
  lprior += student_t_lpdf(Intercept_gov | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
            - 2 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N_pres] mu_pres = rep_vector(0.0, N_pres);
    // initialize linear predictor term
    vector[N_gov] mu_gov = rep_vector(0.0, N_gov);
    mu_pres += Intercept_pres;
    mu_gov += Intercept_gov;
    for (n in 1 : N_pres) {
      // add more terms to the linear predictor
      mu_pres[n] += r_1_pres_1[J_1_pres[n]] * Z_1_pres_1[n];
    }
    for (n in 1 : N_gov) {
      // add more terms to the linear predictor
      mu_gov[n] += r_1_gov_2[J_1_gov[n]] * Z_1_gov_2[n];
    }
    target += bernoulli_logit_lpmf(Y_pres | mu_pres);
    target += bernoulli_logit_lpmf(Y_gov | mu_gov);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
  // actual population-level intercept
  real b_pres_Intercept = Intercept_pres;
  // actual population-level intercept
  real b_gov_Intercept = Intercept_gov;
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1, upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1 : M_1) {
    for (j in 1 : (k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}

It’s a bit tough for me to fully parse this model given the non-centered parameterizations, Cholesky factorizations, etc. But here’s my best interpretation:

Let i index observations, j = (1, \dots, J) index outcome variables, and let c[i] denote the county of respondent i. The model is given by

\begin{align} y_{ij} &\sim \text{Bernoulli}(\theta_{ij}) \\ \theta_{ij} &= \delta_j + \alpha_{c[i],j} \\ \alpha_c &\sim \text{MVN}(0, \text{diag}(\sigma_1, \dots, \sigma_J)'\,\, \Omega \,\,\text{diag}(\sigma_1, \dots, \sigma_J) \\ \sigma_j &\sim \text{Student-}t(3, 0, 2.5) \\ \Omega &\sim \text{LKJ}(1) \end{align}

Here \alpha_c is length-J vector of random intercepts across outcomes for county c, \Omega is a J \times J correlation matrix and the \sigma_j's are the standard deviations of the county-level random effects.

Given that setup, I have a few questions: First, most basically, am I interpreting the code correctly – is this indeed the model that is being estimated?

Second, if the model I wrote down is basically correct, I do not see any modeled correlation between counties in the random intercepts. To be more precise, let \alpha^j denote the vector of random intercepts for outcome j across all counties. Then we have \alpha^{j} \sim \text{MultivariateNormal}(0, \text{diag}(\sigma_j)) – i.e. the covariance matrix is diagonal. Is that correct?

Finally, is there a way to extend the model in brms to allow for correlated random effects both across outcomes and across counties (for the same outcome)? It seems like one possibility would be to stack the dataset, duplicating rows J times (once for each outcome), then estimating a model like this:

bf(y ~ (1 | county) + (1 | outcome)

I believe this would include a county-level effect that is constant across all outcomes and an outcome-level effect that is constant across counties. Is there a reason not to take this “stacked” approach and to use mvind() instead? The stacked approach seems more flexible as you could easily include additional effects for interactions (1 | county : outcome) or survey respondents (1 | respid). But I might be totally misunderstanding something about the nested grouping model!

Thank you – this is the first time I’ve used brms for a full project and it has cut down my coding time immensely.


Reproducible example:


library(brms)
library(dataverse)
library(tidyverse)
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

# Download CCES data
dat <- get_dataframe_by_name("cumulative_2006-2022.rds",
                             dataset = "10.7910/DVN/II2DB6",
                             original = TRUE,
                             .f = readRDS)
dat <- dat %>% 
  filter(year == 2020, 
         state == "North Carolina") %>% 
  mutate(across(c(voted_pres_party, voted_gov_party),
                ~ case_when(.x == "Democratic" ~ 1L,
                            .x == "Republican" ~ 0L, 
                            TRUE ~ NA_integer_))) %>% 
  rename_with(.cols = c(voted_pres_party, voted_gov_party),
              .fn = ~ gsub("voted\\_|\\_party", "", .x)) %>% 
  drop_na(gov, pres, county_fips)


# Fit model
form <- bf(mvbind(pres, gov) ~ (1 | id | county_fips))
out <- brm(form, 
           data = dat, 
           family = bernoulli,
           chains = 2, 
           cores = 2,
           iter = 1000,
           backend = "cmdstanr")  

Your model specification is almost correct. It’s missing the prior on the population-level outcome intercepts:

\delta_j \sim \operatorname{Student-t}(3, 0, 2.5)

I would also write:

y_{ij} \sim \operatorname{Bernoulli}(\operatorname{logit}^{-1}(\theta_{ij}))

to make it clear the \theta parameters are on the logit scale.

And I think you are correct that the random county intercepts (for a given outcome j) are not correlated. At the group level, correlation is between outcomes in the same county, not between counties on the same outcome.

Without group-level predictors (for example: grouping by geographic region, or median household income by county), what is the correlation structure across counties within outcome that you would like to model?

A term like (1 | outcome) is not very meaningful when there are only two outcomes. @jd_c explains why in this thread with gender as two-level grouping variable.

1 Like

I kept wondering about that so I gave it a try. For me the lesson is that brms is very flexible and there may be more than one way to fit the same model; it helps to translate from model to formula, and back.

So here are a sequence of models on your data which has two outcomes, gov and pres, and a (categorical) grouping variable county_fips.

(a) Two univariate models: equivalent to fitting the two models separately, so no parameters are shared between between pres and gov.

fit.univar <- brm(
  bf(
    mvbind(gov, pres) ~ (1 | county_fips)
  ),
  data = dat, ...
)
#> Group-Level Effects: 
#>                    Estimate Est.Error l-95% CI u-95% CI
#> sd(gov_Intercept)      0.72      0.32     0.24     1.47
#> sd(pres_Intercept)     0.65      0.29     0.21     1.33
#> 
#> Population-Level Effects: 
#>                Estimate Est.Error l-95% CI u-95% CI
#> gov_Intercept      0.46      0.28    -0.10     1.02
#> pres_Intercept     0.35      0.26    -0.16     0.86

(b) A multivariate model: the two outcomes get their own population- and group-level parameters and there is a correlation between outcomes within counties.

fit.multivar <- brm(
  bf(
    mvbind(gov, pres) ~ (1 | i | county_fips)
  ),
  data = dat, ...
)
#> Group-Level Effects: 
#>                                   Estimate Est.Error l-95% CI u-95% CI
#> sd(gov_Intercept)                     0.76      0.28     0.34     1.43
#> sd(pres_Intercept)                    0.73      0.28     0.31     1.38
#> cor(gov_Intercept,pres_Intercept)     0.86      0.18     0.37     1.00
#>
#> Population-Level Effects: 
#>                Estimate Est.Error l-95% CI u-95% CI
#> gov_Intercept      0.46      0.28    -0.10     1.03
#> pres_Intercept     0.37      0.28    -0.17     0.92

(c) “Naive” stacked model: pres and gov share all parameters; not flexible at all.

fit.stacked.naive <- brm(
  bf(
    outcome ~ (1 | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(Intercept)     0.81      0.28     0.41     1.48
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI 
#> Intercept     0.44      0.29    -0.15     1.02

(d) A “better” stacked model: pres and gov each have a population-level intercept but share the group-level parameters. This model is still less flexible than two univariate models.

fit.stacked.pop.level <- brm(
  bf(
   outcome ~ 0 + group + (1 | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(Intercept)     0.81      0.29     0.40     1.50
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.48      0.31    -0.14     1.09
#> grouppres     0.40      0.30    -0.20     1.01

(e) Stacked model equivalent to two univariate models. We add group-level intercepts, ie. random county effects, for each outcome. But we make the outcomes uncorrelated within county using the || notation.

fit.stacked.group.level.uncor <- brm(
  bf(
    outcome ~ 0 + group + (0 + group || county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(groupgov)      0.71      0.31     0.25     1.45
#> sd(grouppres)     0.66      0.30     0.22     1.40
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.45      0.28    -0.12     1.05
#> grouppres     0.34      0.27    -0.21     0.89

(f) Stacked model equivalent to the multivariate model. We specify population-level intercepts for outcomes and group-level random effects for counties. The outcomes within counties are correlated.

fit.stacked.group.level.cor <- brm(
  bf(
    outcome ~ 0 + group + (0 + group | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>                         Estimate Est.Error l-95% CI u-95% CI
#> sd(groupgov)                0.77      0.29     0.34     1.49
#> sd(grouppres)               0.73      0.29     0.31     1.44
#> cor(groupgov,grouppres)     0.85      0.18     0.33     1.00

#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.46      0.29    -0.12     1.05
#> grouppres     0.37      0.27    -0.20     0.90

Now that I’ve gone through those models, the multivariate specification is more succinct than the equivalent stacked formulation. But if we add individual- and group-level predictors to the mix, perhaps the stacked approach will indeed be easier to understand.

2 Likes

Thanks for the help! Glad to hear the model is doing what I expected. And of course you’re right about missing the prior on the global intercept and the inverse logit function.

Without group-level predictors (for example: grouping by geographic region, or median household income by county), what is the correlation structure across counties within outcome that you would like to model?

The full model does contain both geographic predictors, so there will be correlation across counties induced by those terms. I was wondering if there was a feasible way to specify correlations between county random effects for the same outcome (i.e., in my notation, \alpha^j \sim D(0, \Gamma_j), with D some distribution and \Gamma_j a scale/covariance matrix). In this case it’s probably not a good idea as the covariance matrices would be massive — roughly 3,100 \times 3,100 for U.S. counties — but I was curious for other random effects.

A term like (1 | outcome) is not very meaningful when there are only two outcomes. @jd_c explains why in this thread with gender as two-level grouping variable.

Yep, I agree with that. The actual model I’m working with has several outcomes rather than just two. In any case, you’re right that we could include them as contrasts rather than random effects.

I kept wondering about that so I gave it a try. For me the lesson is that brms is very flexible and there may be more than one way to fit the same model; it helps to translate from model to formula, and back.

These results are very useful – thank you! So my conclusion is that it’s possible to fit the model in either way that I mentioned: the results from models (b) and (f) are nearly identical.

1 Like

Is there a way to implement regional priors? I attempted the following method, but it did not yield the desired results and get the errors
[R1/R1- regions, and F2F/Event etc. are independent variables.]

prior3 ← c(set_prior(“normal(0,.5)”, class = “b”, coef = “F2F”, group= ‘R1’,lb=0),
set_prior(“normal(1,.5)”, class = “b”, coef = “F2F”, group= ‘R2’,lb=0),
set_prior(“normal(2,25)”, class = “b”, coef = “Event_Attendee”, group= ‘R1’,lb=0),
set_prior(“normal(2.5,25)”, class = “b”, coef = “Event_Attendee”, group= ‘R2’,lb=0),
set_prior(“normal(3.5,25)”, class = “b”, coef = “Virtual_calls”, group= ‘R1’,lb=0)
set_prior(“normal(4.5,25)”, class = “b”, coef = “Virtual_calls”, group= ‘R2’,lb=0)
)
prior3

set.seed(7662345)

model3= brms::brm(Sales_Unit ~ F2F + Event_Attendee + Virtual_calls + (1 + F2F + Event_Attendee + Virtual_calls |Region), data = model_data_Trelegy, prior= prior3)