Using only certain group-level effects in posterior_linpred()

I have a formula like

y ~ x + (x | group1) + (x | group2) + t2(z)

in brms. Is there a way to set the group-level effects for group1 to 0, but use a specific level in group2 in the posterior_linpred() call? Basically I just want the posterior samples for the quantity

b_Intercept + r_group2.somelevel.Intercept. +
              (b_x + r_group2.somelevel.x.)*x + <samples from the smooth at z>

for particular values of x and z. So for example:

b_Intercept + r_group2.somelevel.Intercept. +
              (b_x + r_group2.somelevel.x.)*2 + <samples from the smooth at 10>

I thought it might be possible using the re_formula option, so I tried this:

posterior_linpred(
    fit,
    newdata = data.frame(
        x = 2,
        group2 = "somelevel",
        z = 10
    ),
    re_formula = "~group2"
)

but the output didn’t match what I wanted (in my case the mean of the formula I wrote above is negative, but the mean of the output of posterior_linpred() was positive).

Just create a copy of the original dataset, set the variables to whatever, and call posterior_linpred with the newdata argument being that modified dataset.

Hi @bgoodri,

I’m not sure I understand. If I call posterior_linpred() like

posterior_linpred(
    fit,
    newdata = data.frame(
        x = 2,
        group1 = "g1level",
        group2 = "somelevel",
        z = 10
    )
)

then doesn’t that only gives me the samples for the case when group1 = “g1level”, as in

b_Intercept + r_group1_g1level.Intercept + r_group2.somelevel.Intercept. +
            (b_x + r_group1.g1level.x. + r_group2.somelevel.x.)*x + <smooth at z>

?

I mean like

nd <- old_data
nd$group2[1:nrow(nd)] <- "somelevel"
posterior_linpred(fit, newdata = nd)

I don’t see how that will get me what I’m trying to get, @bgoodri. How is that different from what I was doing in my last comment, other than the fact that I’m just supplying a single data point there (which is all I’m interested in)?

I’m probably not explaining my question well. I’ve edited my original post a bit, specifically the part about exactly what parameter samples I want to combine. Does that make it clearer?

That helped, but I don’t think such an option is supported by brms (or rstanarm or lme4) because the predictions do not seem well defined. You have a fit that has various intercepts for group1 and group2, conditional on the data, and then you are asking to further condition on the intercepts for group1 being zero. Conditioning on the intercepts for group1 being zero would change the posterior distribution of all the other parameters being used to make the predictions. This is the idea behind the projpred package, but I don’t think it works for multilevel models yet.

Oh wow, ok. I thought I was “controlling for variation in group1” by not including their samples. It’s good to learn that what I was doing isn’t well-defined.

What should I be doing instead?

I see that posterior_linpred() has the option allow_new_levels = TRUE. Maybe I should use this?

Thanks for this. I really appreciate the help.

The allow_new_levels = TRUE option (which is the default) allows you to get the posterior distribution of the conditional mean for a group that was not in the original dataset. That can be useful but I don’t think that is what you are asking for. I think the closest thing would be to condition on group2 but integrate over group1 with re_formula = ~group2. You said originally that yielded positive values, but I didn’t understand why those were suspect.

Okay, that makes sense. I think the remaining issues are just technical ones specific to the actual model I’ve been working with, so I’ll continue to play around with it myself. Thanks again for your help.

Since it’s probably worth mentioning: I was calling posterior_linpred() twice, once for x=0 and again for x=1, then subtracting their outputs. So I’m guessing the unexpected outcome I saw when using re_formula = ~group2 was due to the averaging happening differently each function call.

What seems to work for my case is specifying a specific level of group1 in the newdata, manually subtracting the samples for that level from the output of posterior_linpred(), and replacing them with Gaussian samples using the group-level standard deviation (i.e. rnorm(n_samples, 0, sd_group1__x)). I think this is roughly what you meant by integrating over group1.