I’m preparing a tutorial script for a random slopes model with a gaussian outcome. With a binary x variable, the brm code is fairly straightforward as far as such models go:
model <- brm ( y ~ x + (1+x|group_id), data = dat, chains = 2, cores = 2, iter = 3000, warmup = 1000)
One objective for users of the tutorial is to use the respective fixed effects and the random effects (both intercepts and slopes, when necessary) to calculate predictions on the response scale. My inclination is to have them use fixef() and ranef() to extract the posterior means that would allow them to do that.
When I compare the extractions from coef() with what I calculate from fixef() and ranef(), everything looks mostly identical. There are discrepancies, though, mostly starting a few digits to the right of the decimal. My inclination is to attribute this to rounding error, but before recommending the use of coef(), I figured that it would be ideal to confirm its functionality.
Can someone with more knowledge of brms confirm what happens under the hood with coef()? Thanks.
@paul.buerkner will know the details, but if you’re seeing small differences only after a few decimal places then rounding error seems like a reasonable guess. That said, I would be surprised if brms is rounding before doing to computations so I’m not sure where that rounding error would be introduced. There may be a different reason for this.
Actually, are you adding fixef and ranef point estimates or adding the posterior draws together? It’s possible the point estimates are rounded.
Thanks, Jonah. In this case, I’m just adding together the posterior means. Your message makes me wonder if coef() does the calculations for each sample in the posterior and then reports the mean of those calculations.
As I look more closely at the coef object, one noteworthy thing is that the the responses make sense for the reference level of the binary x factor, in which case it’s straightforward addition of the fixed effect intercept to the group-specific random effects. That was the comparison I mentioned earlier when noting the slight differences in estimates.
When x=1, though, what I’m seeing in the vector is the random effects themselves – the random slopes, that is. To obtain estimates on the scale of the response, a necessary step with the coef object is evidently to add the fixed effect of x and then the slope. In other words, it seems that coef() returns sensible intercepts for each group but gets only part of the way there for the random slopes.
Given that difference, which will potentially confuse novices, my inclination is to stick with fixef() and ranef() and manual calculations for now.
To the best of my knowledge coef() does exactly as intended by adding fixef() and ranef() (samples) together and then computing the summaries afterwards. I don’t understand your reasoning starting with x=1. Can you give me an example code where you think coef() fails?
I don’t have simulation code cued up, so I’ll start with a brief description of the data and effects.
Basically, let’s assume that the response variable is something like systolic blood pressure (BP) and we are comparing the BP of women (the reference level) and men. Let’s say that the population-level intercept is 120 and the fixed effect for male is 10, implying an average BP of 130 for men. Now assume that the men and women are clustered in countries, which is our grouping variable for random intercepts and slopes.
By analogy, what I’m seeing in my current coef() output is that the predictions for the intercept are the fixed effect (120) plus the random intercept that corresponds to the country of the measured individual. So far so good.
In the vector for males, by contrast, there is a vector of decimals. In other words, the vector is not returning the effects that we would expect to see on the response scale (which would be generally close to 130).
(My initial assumption was that this vector contains the random slopes, but as I compare to posterior means, that assumption is currently being challenged.)
For a little extra clarification, see the attached screenshot. The first vector, female_by_hand, is from calculations done on fixef() and ranef() objects from the model. You can see that these line up perfectly with the second vector, female_from_coef, which is the vector of estimates for the intercept(s) in the coef object.
There is a binary variable for males, which in this case has a small value (0.08189017), so basically the means are the same. I calculate male_by_hand by including the respective fixed effects and random effects.
The final vector, male_from_coef, is not what is in the coef object. Rather it is constructed by multiplying the aforementioned vector of decimals by the population-level effect and carrying through with calculations.
Understood, that previous request came in as I was writing up the results. My current script is suboptimal because I’ve been doing a fair bit of command line entry while sorting through this. Let me see if I can pull out relevant bits.
I’m trying to rename things as I go to align with a generic example and potentially introducing errors in the process. I apologize. In any case, the calculations by hand look something like the following.
My assumption is that coef() attempts to do all these calculations as well. But as noted above, the vector in coef contains only decimals which are well outside the range of plausible values.
For me to believe that something goes wrong, I need a reproducible example. Here is an example of mine to show how fixef() and ranef() should be added to get the results of coef():
Now that I see your code I understand what’s wrong. You think that coef() should return predictions for females and males but that’s not what it’s intended to. It is just to compute coefficients for each level, which is the intercept (happens to be females) and the difference between males and females in your case. To obtain predictions of the sort you have in mind use fitted() with argument newdata appropriately specified.
As a brief follow-up, the structure of the summarized coef() object would seemingly lend itself to the appropriate average predictions. That is, for group j, adding the estimated intercept to the estimated effect of male yields the same sum as adding the two fixed effects to the random intercept and the slope for household j. (An error in my improvised code prevented me from seeing that correspondence previously.)
P.S. It’s good to see BRMS on the discourse forum.