Residual covariance (brms mutlivariate model in long format)

Hi Everyone,

I’ve received a lot of help for fitting models like this from others. e.g., see here .

I’m trying to fit a multivariate outcomes model in brms in long format. The wide-format equation looks like this.

bf(y_correct ~ 0 + (1 | p | subject)) + 
bf(y_error ~ 0 + (1 | p | subject)) +

If re-written in a long-format, so correct (y_correct) and error trial scores (y_error) are in the same column (y) with event type (correct, error) in a separate one, I end up with the following syntax.

bf(y ~ 0 + event + (0 + event | subject),
  sigma ~ 0 + event)

This gives me all of the estimates I need, except for covariances between the sigma for correct and sigma for error. (i.e., the same output cor in summary() thanks to set_rescor(TRUE) from the wide-format above).

Anyone have any recommendations?

VarCorr() does not include the sigmas.

I apologize if I missed something in a vignette somewhere… I’ve been working on this for a while now…

Thank you for you thoughts!

  • Operating System: OS X 10.15.6
  • brms Version: 2.13.5
1 Like

Hey there!

Hm, yea, I think this would be a it awkward. I think the covariance is captured in the regression coefficient for event. IIRC, a regression coefficient is simply \beta_\text{event}=\rho \frac{\sigma_y}{\sigma_\text{event}}, but it’s a bit too hot for me to work that through for the hierarchical part.

What I wanted to say is: I think the covariances between the sigma for correct and sigma for error is in your model, but a bit awkward to extract. Personally, I would got with the wide-format model if that’s possible.

All that being said, quite possible there is a way to exactly do this. I’m not really a brms expert…


Thanks for your thoughts @Max_Mantei. I typically use the wide format, when I can, because it’s more straightforward (in my head) to implement. For this particular dataset that I’m working on though, I have unbalanced data (not missing data, see discussion with @paul.buerkner and @jroon here: Multivariate formula with different number of observations).

I’ll keep thinking about it. Thanks again!

Just skimmed through the other thread. I think you are already almost doing the correct thing. Keep in mind:

So in your model above the residual correlations would be captured by doing something like y ~ 0 + event + (1 + event | subject) adding an intercept to the subject level. this would give you a varying intercept per subject, which is common across event types (if this number is higher, you predictions for both events types is higher, if it is lower your predictions for both event types is lower -> residual correlation). If you think about the “random/mixed effect” as composite error, this might become more obvious. There is a cool chapter in Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman/Hill) – I think it is chapter 12.5 – that further describes this “view” of hierarchical models.

Hope this helps!

Hm, interesting. I don’t readily see how that would give me the sigma covariances between the two events, but that’s probably just due to my missing something. I’ll give the approach a shot. I’ll also take a look at that chapter in the Gelman/Hill book. I’ll report back when I’ve had a chance to do so :)

Thanks again,

1 Like

Thanks, @Max_Mantei.

I read the section of Gelman and Hill’s book that you referred to. A great read. Thanks for the recommendation. The section Large regression with correlated errors is what I’m trying to do, but I can’t wrap my head around how to get the covariances for the residual using the long-format equation in brms.

I am interested in explicitly estimating the covariance for my two population-level sigmas (sigma_eventcorrect, sigma_eventerror). Adding the subject-level intercept didn’t help me estimate the covariance. Am I missing something?

@paul.buerkner. Sorry to ping you. Do you know if it’s possible to get the covariances of the two levels of “event” in sigma ~ 0 + event when I write this equation in long format? You helped me get this far already, so I thought it wouldn’t hurt to ask :)

thanks in advance if you have any advice!

Can you clarify, which covariances (on which level of the model) you mean?

Absolutely. Here is the formula I am working with.

There are only two events: correct and error.

bf(y ~ 0 + event + (0 + event | subject), sigma ~ 0 + event)

This formula gives me the group-level effects (~subject) variances for the intercepts for each event ( for correct and for error) as well as the covariance between the two events by just using VarCorr().

This formula also gives me the population-level effects for two residual variances (sigmas) separately for each event: one for correct and one for error.

However, I also need the covariance between the event:correct sigma and event:error sigma. In other words, I need the covariances between the two residuals. In the wide format of the equation, the residual correlations (rescor(correct,error)) is shown in the summary of the brm fit, and the covariance between sigma_correct and sigma_error can be extracted using VarCorr().

I hope that clarifies what I’m looking for. Please let me know if you have any questions.

I appreciate you taking the time to help me this! Thank you! (and thanks for brms)

To achieve that, you could cheat a little and add a grouped autoregressive term to the main the formula. It would look something like ar(gr = <group>, cov = TRUE), where <group> is a grouping variable that has the same value for both observations you want to model as correlated (and different values across different pairs of course).

Thanks for your help.

I’ve been thinking about this a bit. I’m trying to wrap my head on around how to code <group>.

I imagine that if I had 20 observations per event (correct, error) within a subject, then I would simply code the first pair of observations as 1, the second pair as 2, and so on. I am guessing that I would do this nested within subjects, starting over at 1 for each subject’s first pair.

How do I deal with unbalanced observations? I’m looking at responses based on participant behavior, so typically people have (many) more correct observations than error observations. This means I do not necessarily have a “matched” observation for each correct response, so I cannot pair all of the data up.

Thanks again for your continued help! I’m going to do some more reading on ar() as well, because it’s new to me.

I think I have explained that earlier but I forgot that this was the situation you are in. Residuals only make sense if you have two observations which natrually pair with one another. Since you don’t seem to have that residuals on the observational model do not seem sensible for your model.


Understood. I guess I will throw in the towel. Thanks for all of your help.