I’ve got a model with one group-level variable g with varying intercepts and varying slopes for population-level effect x1, which do not correlate with the varying intercepts (the correlation parameter is unnecessary as per elpd_loo, and it greatly complicates the model). Now I’m interested in testing for varying g slopes for another population-level effect x2, including the possibility that those slopes might correlate with the varying intercepts.
How do I use brms to specify a model in which g has varying intercepts which are uncorrelated with its varying x1 slopes but correlated with its varying x2 slopes? When I call brms with y ~ (x1||g) + (0+x2|g) + x1 + x2, it fits a model with no correlated slopes and intercepts whatsoever, even for x2 and g.
Is there a solution?
There is a solution. Define every random effect, instead of using
ID is an arbitrary character string. Random effects sharing an
ID will be modeled as correlated (as long as the grouping variables are the same), and random effects with different
ID will be modeled as uncorrelated.
How would the model formula then look, in brms syntax? Rather than y ~ (x1||g) + (0+x2|g) which doesn’t work?
Sidenote: a highly inelegant approach inspired by your response, which might in fact have worked, was to create a duplicate of the original grouping variable, with a different name and level labels, and then add this duplicate as a different random-effect term in the model formula: y ~ (x1||g) + (x2|gDuplicate) + x1 + x2. The results seem to make sense. But I’d like to understand your suggestion because it might be better.
Written out the long way, you presumably want
y ~ 1 + x1 + x2 + (1 |id1| g) + (x1 |id2| g) + (x2 |id1| g). Because the random intercept and the random
x2 term share the identifier
id1, they will be modeled as correlated. We can make this a bit more concise with
y ~ x1 + x2 + (1 + x2 |id1| g) + (x1 |id2| g). I’m reasonably certain that in this particular case it should work to get more concise again with
y ~ x1 + x2 + (1 + x2 | g) + (x1 || g). But the beauty of the
|ID| syntax is that no matter how many random terms you have grouped by
g, you can model any arbitrary collections of them as correlated. Not only that, but this flexibility remains even if the random effects are distributed across multiple formulae in distributional regression! Pretty nifty.
Turns out that the correct syntax for my scenario is as simple as:
y ~ x1 + x2 + (0 + x1 | g) + (x2 | g)
x1 gets random slopes by
g, with no correlation because
0 + suppresses its random intercepts, hence there’s nothing for the slopes to correlate with. And
x2 gets correlating random slopes and intercepts. The
id business was ultimately not an ideal solution because while it did suppress the correct correlation parameters, it also introduced unsolicited new correlation parameters that I was not interested in.