Understanding brms nested grouping term in multivariate response models

I kept wondering about that so I gave it a try. For me the lesson is that brms is very flexible and there may be more than one way to fit the same model; it helps to translate from model to formula, and back.

So here are a sequence of models on your data which has two outcomes, gov and pres, and a (categorical) grouping variable county_fips.

(a) Two univariate models: equivalent to fitting the two models separately, so no parameters are shared between between pres and gov.

fit.univar <- brm(
  bf(
    mvbind(gov, pres) ~ (1 | county_fips)
  ),
  data = dat, ...
)
#> Group-Level Effects: 
#>                    Estimate Est.Error l-95% CI u-95% CI
#> sd(gov_Intercept)      0.72      0.32     0.24     1.47
#> sd(pres_Intercept)     0.65      0.29     0.21     1.33
#> 
#> Population-Level Effects: 
#>                Estimate Est.Error l-95% CI u-95% CI
#> gov_Intercept      0.46      0.28    -0.10     1.02
#> pres_Intercept     0.35      0.26    -0.16     0.86

(b) A multivariate model: the two outcomes get their own population- and group-level parameters and there is a correlation between outcomes within counties.

fit.multivar <- brm(
  bf(
    mvbind(gov, pres) ~ (1 | i | county_fips)
  ),
  data = dat, ...
)
#> Group-Level Effects: 
#>                                   Estimate Est.Error l-95% CI u-95% CI
#> sd(gov_Intercept)                     0.76      0.28     0.34     1.43
#> sd(pres_Intercept)                    0.73      0.28     0.31     1.38
#> cor(gov_Intercept,pres_Intercept)     0.86      0.18     0.37     1.00
#>
#> Population-Level Effects: 
#>                Estimate Est.Error l-95% CI u-95% CI
#> gov_Intercept      0.46      0.28    -0.10     1.03
#> pres_Intercept     0.37      0.28    -0.17     0.92

(c) “Naive” stacked model: pres and gov share all parameters; not flexible at all.

fit.stacked.naive <- brm(
  bf(
    outcome ~ (1 | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(Intercept)     0.81      0.28     0.41     1.48
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI 
#> Intercept     0.44      0.29    -0.15     1.02

(d) A “better” stacked model: pres and gov each have a population-level intercept but share the group-level parameters. This model is still less flexible than two univariate models.

fit.stacked.pop.level <- brm(
  bf(
   outcome ~ 0 + group + (1 | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(Intercept)     0.81      0.29     0.40     1.50
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.48      0.31    -0.14     1.09
#> grouppres     0.40      0.30    -0.20     1.01

(e) Stacked model equivalent to two univariate models. We add group-level intercepts, ie. random county effects, for each outcome. But we make the outcomes uncorrelated within county using the || notation.

fit.stacked.group.level.uncor <- brm(
  bf(
    outcome ~ 0 + group + (0 + group || county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>               Estimate Est.Error l-95% CI u-95% CI
#> sd(groupgov)      0.71      0.31     0.25     1.45
#> sd(grouppres)     0.66      0.30     0.22     1.40
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.45      0.28    -0.12     1.05
#> grouppres     0.34      0.27    -0.21     0.89

(f) Stacked model equivalent to the multivariate model. We specify population-level intercepts for outcomes and group-level random effects for counties. The outcomes within counties are correlated.

fit.stacked.group.level.cor <- brm(
  bf(
    outcome ~ 0 + group + (0 + group | county_fips)
  ),
  data = dat.stacked, ...
)
#> Group-Level Effects: 
#>                         Estimate Est.Error l-95% CI u-95% CI
#> sd(groupgov)                0.77      0.29     0.34     1.49
#> sd(grouppres)               0.73      0.29     0.31     1.44
#> cor(groupgov,grouppres)     0.85      0.18     0.33     1.00

#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI
#> groupgov      0.46      0.29    -0.12     1.05
#> grouppres     0.37      0.27    -0.20     0.90

Now that I’ve gone through those models, the multivariate specification is more succinct than the equivalent stacked formulation. But if we add individual- and group-level predictors to the mix, perhaps the stacked approach will indeed be easier to understand.

2 Likes