Defining a model with nested outcomes


#1

I’m trying to estimate a model where outcomes are nested within another variable. Specifically, this is a sports model, where the score for each team is nested within the game. So the data looks like this:

#> # A tibble: 2 x 7
#>    game score points hc_offense hc_defense offense defense
#>   <int> <int>  <int>      <int>      <int> <chr>   <chr>  
#> 1     1     1    114          1          0 Team A  Team B 
#> 2     1     2     77          0          1 Team B  Team A

So the first row is the points scored by the home team (Team A, with Team B on defense), and the second row is points scores by the away team (Team B, with Team A on defense).

I can estimate this model with nlme as such:

model <- gls(points ~ hc_offense + hc_defense + offense + defense,
  correlation = corSymm(form = ~1|game),
  weights = varIdent(form = ~1|score))

This allows for the estimation of the correlation between scores within a game. My understanding is that I can estimate the correlation between outcomes by using a multivariate model with brms. However, this requires, I restructure so that one game is on one row, instead of two. This would allow me to specify a multivariate model such as:

bf_home <- bf(home_points ~ hc_offense + offense + defense)
bf_away <- bf(away_points ~ hc_defense + offense + defense)
model <- brm(bf_home + bf_away)

The problem with this is that I can’t define more than one offense/defense for the row:

#> # A tibble: 1 x 7
#>    game home_points away_points hc_offense hc_defense offense defense
#>   <int>       <int>       <int>      <int>      <int> <chr>   <chr>  
#> 1     1         114          77          1          1 Team A  Team B

Team A should the offense for bf_home, but should be the defense for bf_away. I could further spread out the data:

#> # A tibble: 1 x 9
#>    game home_points away_points hc_offense hc_defense home_pt_offense home_pt_defense away_pt_offense away_pt_defense
#>   <int>       <int>       <int>      <int>      <int> <chr>           <chr>           <chr>           <chr>
#> 1     1         114          77          1          1 Team A          Team B          Team B          Team A

This would allow me to specify the offense and defense for each score:

bf_home <- bf(home_points ~ hc_offense + home_pt_offense + home_pt_defense)
bf_away <- bf(away_points ~ hc_defense + away_pt_offense + away_pt_defense)
model <- brm(bf_home + bf_away)

However, I then end up with 2 offensive coefficients for each team: one when they are listed as home_pt_offense and one when listed as away_pt_offense (the same applies to the defensive coefficients).

I am probably missing something obvious, but what is the best way to define this type of model in brms? I have so far been unable to find a way to estimate this as a multivariate model, but also keep the predictors defined in the correct way.

I’m not sure this is relevant for this particular question, but just in case:

  • Operating System: macOS Mojave (10.14.3)
  • brms Version: 2.6.0

#2

brms currently does not allow for a correlation structure similar to nlme::corSymm but this may come in the future. Further, brms does currently not allow to share coefficients across univariate models (which is what would be required for the multivariate version to be equivalent to the nlme model) but this will also come in the future (brms 3.0 to be precise).

What brms does allow though, in contrast to nlme, is the modeling of multi membership terms, which may be relevant in your model as each outcome belongs to two teams. See ?mm for details and also vignette("brms_multilevel")