Multivariate formula with different number of observations

Thanks for posting!

I was hoping that I could have a different number of observations for two different outcomes and still fit the models in brms using the multivariate normal. But, rescor still requires the outcomes to have the same number of observations :( I need rescor = TRUE so I can model the covariances between predictors and residuals.

For now, I’ll keep trying to do this in Stan… but it’s still beyond my skill level… I might be ready to give it another few-month break…

1 Like

Oh I didn’t spot set_rescor(FALSE) there. Also not useful for me so since I too need to model rescor = TRUE 😭

Haha, sorry to be bearer the bad news! Well, if you figure out a solution for rStan, be sure to let me know! :) (or find a vignette… but I’ve searched high and low for one…)

1 Like

haha no worries - better you told me than I did a bunch of work before noticing.

Hmm you could possibly treat the outcomes with ‘absent rows’ as missing values, but I have not tried this myself as yet.

See the vignette here:

Ya, I’ve thought of that, but the data aren’t truly “missing”. I do behavioral experiments where the unbalanced nature is occurs as part of the experiments. e.g., if I’m typically only interested in correct trial data (response times). The number of errors is never constant across participants…

But does that matter if it is a dependent variable? When you extend the model as in the example there, my understanding is that the ‘missing values’ are treated as parameters - so for missing outcome variables it is similar to a doing a prediction for new data. If you want you could make a model with this parameters included in order to include your uneven outcomes, and then ignore these parameters when interpreting your results… I don’t think it can effect your coefficients other than by allowing you to include all the data

Why is it that you have different number of observations?

I have unequal observations, because I look at response time data based on response accuracy (and only trials where a participant actually responds.

For example, one person could have 300 correct trials and 20 error trials and another could have 290 correct trials and 25 error trials (with 5 discarded non-response trials).

@paul.buerkner you actually helped me with a similar problem before (Modeling unbalanced multivariate outcomes in brms), which was referenced above.

My hope is to get an identical output to using
bf(congruent ~ 0 + (1+trial|p|id)) +
bf(incongruent ~ 0 + (1+trial|q|id) + rescor(TRUE)

In this case it would be something like
bf(correct ~ 0 + (1+trial|p|id)) +
bf(error ~ 0 + (1+trial|q|id) + rescor(TRUE)

That output of this brms fit has all of the (co)variance components that I need, and none that I don’t :) I don’t know if that helps to clarify what I’m after… I can only get it to work if I have the same number of correct and error trials across participants.

@jroon That is an interesting idea… maybe the issue is that I don’t understand what’s happening when mi() is used, as in… bf(congruent | mi() ~ 0 + (1+trial|p|id))

What keeps you from stacking correct and error trials into the same variable (long format) and then modeling that?

I couldn’t figure out how to get the covariance between residual variances for the two model types… :)

I want the sd(correct_Intercept), sd(error_Intercept), sd(correct_trial), sd(error_trial) and the covariances among them. I also need the sigma_correct, sigma_error, and rescor(correct, error).

1 Like

Is this the same model however?

If I understand you correctly you mean make one long format output variable and an indicator variable to say which outcome in the wide format that each row corresponds to. Then when you fit that model all outcomes are considered to come from the same univariate distribution (e.g. normal) with the indicator having a moderating effect, whereas in the multivariate version the outcomes are considered to come from a multivariate normal ?

I’ve seen this done in e.g. jags models in the past - but I’ve never understood are these things truly equivalent.

You can only define a correlation between two variables when there is a one to one correspondence between values. This is not a brms limitation but how correlations work. So i am unsure how you would define this residual correlation in the first place.

Hm, all participants have observations for each event type, so it’s not like any participants don’t have a person-specific estimate of correct or error trial. I’m interested in the group-level effects, so I figured as long as everyone had at least a few good observations of each trial type it should be doable…

So you want to correlate the random effects per participant? This is easy by stacking and having a term of the form (0 + type | person) where type is either correct or error.

Just to make sure I’m not misunderstanding.

I can do…
score ~ 0 + event + (0 + event | trial) + (0 + event | person)

(0 + event | trial): this gives me separate between-item variances for correct trials and error trials and the covariance

(0 + event | person): separate variance estimates for between-person variances for correct and error trials and the covariance

But, I only get a single residual variance estimate (sigma) for the whole model. I need separate residual variances for correct and error trial estimates and the covariance between sigmas. Am I messing up the brms formula somehow?

Also, thanks for the quick replies and for thinking this through with me :) I appreciate it!

Use brms distributional syntax a la sigma ~ … for this purpose

Excellent!! You’ve changed my life :) Thank you so much for your help.

For posterity:

I now use…
bf(score ~ 0 + event +
(0 + event | trial) + (0 + event | id2),
sigma ~ 0 + (0 + event | id2))

This gives me separate residual variances for each event and the covariance (cor(sigma_eventcorrect, sigmaevent_error).

Thanks again, @paul.buerkner!

1 Like

Hi @paul.buerkner. Sorry, I’m back.

The sigmas (residual variance) for each event are remarkably small (~4) compared to the univariate brms model variances (~55) that are run separately on each scores from each event.

I ended up using…

bf(score ~ 0 + event + (0 + event | trial) + (0 + event | id), sigma ~ 0 + event)

I don’t think I needed sigma ~ 0 + (0 + event | id) like I wrote in my previous post, because I only need population-level estimates for residual (co)variance.

I don’t know if anything strikes you as incorrectly written in this formula. I’ll keep tinkering to see if I can figure it out. So close!


1 Like