I’m fitting a multivariate logit response model with correlated varying intercepts using brms
, following this vignette. The application is modeling multiple survey responses per respondent. I am trying to understand what the exact model specification is when using the nested notation (1 | id | group)
.
A simplified version of the model I’m fitting looks like this (full reproducible example below):
form <- bf(mvbind(pres, gov) ~ (1 | id | county_fips))
out <- brm(form,
data = dat,
family = bernoulli)
Both outcomes (pres
and gov
) are binary and I want to include random intercepts at the county level that are correlated across outcomes.
I can’t find documentation about exactly what model (1 | id | county_fips)
denotes. The generated Stan code from the model above looks like this:
// generated with brms 2.18.0
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
int<lower=1> N_pres; // number of observations
array[N_pres] int Y_pres; // response variable
int<lower=1> N_gov; // number of observations
array[N_gov] int Y_gov; // response variable
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N_pres] int<lower=1> J_1_pres; // grouping indicator per observation
array[N_gov] int<lower=1> J_1_gov; // grouping indicator per observation
// group-level predictor values
vector[N_pres] Z_1_pres_1;
vector[N_gov] Z_1_gov_2;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept_pres; // temporary intercept for centered predictors
real Intercept_gov; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_pres_1;
vector[N_1] r_1_gov_2;
real lprior = 0; // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_pres_1 = r_1[ : , 1];
r_1_gov_2 = r_1[ : , 2];
lprior += student_t_lpdf(Intercept_pres | 3, 0, 2.5);
lprior += student_t_lpdf(Intercept_gov | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 2 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N_pres] mu_pres = rep_vector(0.0, N_pres);
// initialize linear predictor term
vector[N_gov] mu_gov = rep_vector(0.0, N_gov);
mu_pres += Intercept_pres;
mu_gov += Intercept_gov;
for (n in 1 : N_pres) {
// add more terms to the linear predictor
mu_pres[n] += r_1_pres_1[J_1_pres[n]] * Z_1_pres_1[n];
}
for (n in 1 : N_gov) {
// add more terms to the linear predictor
mu_gov[n] += r_1_gov_2[J_1_gov[n]] * Z_1_gov_2[n];
}
target += bernoulli_logit_lpmf(Y_pres | mu_pres);
target += bernoulli_logit_lpmf(Y_gov | mu_gov);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
// actual population-level intercept
real b_pres_Intercept = Intercept_pres;
// actual population-level intercept
real b_gov_Intercept = Intercept_gov;
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1, upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M_1) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}
It’s a bit tough for me to fully parse this model given the non-centered parameterizations, Cholesky factorizations, etc. But here’s my best interpretation:
Let i index observations, j = (1, \dots, J) index outcome variables, and let c[i] denote the county of respondent i. The model is given by
\begin{align} y_{ij} &\sim \text{Bernoulli}(\theta_{ij}) \\ \theta_{ij} &= \delta_j + \alpha_{c[i],j} \\ \alpha_c &\sim \text{MVN}(0, \text{diag}(\sigma_1, \dots, \sigma_J)'\,\, \Omega \,\,\text{diag}(\sigma_1, \dots, \sigma_J) \\ \sigma_j &\sim \text{Student-}t(3, 0, 2.5) \\ \Omega &\sim \text{LKJ}(1) \end{align}
Here \alpha_c is length-J vector of random intercepts across outcomes for county c, \Omega is a J \times J correlation matrix and the \sigma_j's are the standard deviations of the county-level random effects.
Given that setup, I have a few questions: First, most basically, am I interpreting the code correctly – is this indeed the model that is being estimated?
Second, if the model I wrote down is basically correct, I do not see any modeled correlation between counties in the random intercepts. To be more precise, let \alpha^j denote the vector of random intercepts for outcome j across all counties. Then we have \alpha^{j} \sim \text{MultivariateNormal}(0, \text{diag}(\sigma_j)) – i.e. the covariance matrix is diagonal. Is that correct?
Finally, is there a way to extend the model in brms
to allow for correlated random effects both across outcomes and across counties (for the same outcome)? It seems like one possibility would be to stack the dataset, duplicating rows J times (once for each outcome), then estimating a model like this:
bf(y ~ (1 | county) + (1 | outcome)
I believe this would include a county-level effect that is constant across all outcomes and an outcome-level effect that is constant across counties. Is there a reason not to take this “stacked” approach and to use mvind()
instead? The stacked approach seems more flexible as you could easily include additional effects for interactions (1 | county : outcome)
or survey respondents (1 | respid)
. But I might be totally misunderstanding something about the nested grouping model!
Thank you – this is the first time I’ve used brms
for a full project and it has cut down my coding time immensely.
Reproducible example:
library(brms)
library(dataverse)
library(tidyverse)
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
# Download CCES data
dat <- get_dataframe_by_name("cumulative_2006-2022.rds",
dataset = "10.7910/DVN/II2DB6",
original = TRUE,
.f = readRDS)
dat <- dat %>%
filter(year == 2020,
state == "North Carolina") %>%
mutate(across(c(voted_pres_party, voted_gov_party),
~ case_when(.x == "Democratic" ~ 1L,
.x == "Republican" ~ 0L,
TRUE ~ NA_integer_))) %>%
rename_with(.cols = c(voted_pres_party, voted_gov_party),
.fn = ~ gsub("voted\\_|\\_party", "", .x)) %>%
drop_na(gov, pres, county_fips)
# Fit model
form <- bf(mvbind(pres, gov) ~ (1 | id | county_fips))
out <- brm(form,
data = dat,
family = bernoulli,
chains = 2,
cores = 2,
iter = 1000,
backend = "cmdstanr")