Interpreting multivariate versus univariate models

  • I am running brms 2.13.1 on MacOS 10.14.6

Hello folks,

I am quite new to brms (and multi-level modelling), so I apologise in advance for any basic and obvious questions. I would like to gain a clearer understanding of the difference between a multivariate model (with correlated grouping variables) and a univariate model (with covariates). Essentially, I want to know how similar (or not) the conclusions / interpretations can be from these different models. Details on the models are as follows.

Multivariate model:

I have a multivariate model with three dependent variables (DVs). More specifically, I have one primary DV (DV1) and two that help qualify the primary DV. As such, some of the DVs (DVs 2 and 3) can also be considered as covariates in a univariate model – see below. I also have some factors and a bunch of human participants. The DVs/predictors are different rating scales (from 1-5) based on evaluating a set of images. I also have a factorial design, such that the images fall into different categories. Below is an example model (it takes hours to build hence why I have not included a reproducible version). I only include one factor below for simplicity. I have a varying intercept for ‘item’ = the image (or stimulus) and a varying intercept and slope for participant, which is treated as a correlated grouping variable. The model is here:

brm(mvbind(DV1, DV2, DV3) ~ 1 + factor +
            (1 + factor |a| participant) +
        data = data, family = cumulative("probit"),
        prior = priors,
        iter = 6000, warmup = 1000, cores = 4, chains = 4,
        control = list(adapt_delta = 0.99, max_treedepth = 15),
        init_r = 0.1)

Univariate model:
(I understand that the DV2 and 3 are now predictors, but I wanted to keep the same labelling as above)

brm(DV1 ~ 1 + DV2 + DV3 + factor + 
            (1 + DV2 + DV3 + factor | participant) +
        data = data, family = cumulative("probit"),
        prior = priors,
        iter = 6000, warmup = 1000, cores = 4, chains = 4,
        control = list(adapt_delta = 0.99),
        init_r = 0.1)

Essentially, the univariate model makes sense as follows: I’d like to know how much the factor influences DV1 with other ratings (DV1 and DV2) in the model as covariates. Then I could compare models with and without these covariates and see how much the factor influences the outcome across the different models.

For the multivariate model, it is not so clear to me. I understand that the influence of the factor is estimated for all DVs separately. And that there is a correlated grouping variable. And if I had to guess, I’d say that I could not make the same conclusion as the univariate model? But is this correct? I guess I’d like to know if both models are needed if one wants to estimate how much the three DVs are influenced by the factor AND how much the factor influences my primary DV (DV1) with other factors in the model?

A further question – if ratings on a scale from 1-5 are included as predictors, what’s the best way to include them? Should they be monotonic? After reading this, I am a little unclear.

Any help would be appreciated.


1 Like

Sorry, can’t answer, short on time, maybe @Guido_Biele has time and can answer?

I can give it a quick try:

It is my understanding that multivariate models if you are interested in the effect of predictor variables across multiple outcome variables (DVs). From you description, I could not see that you were specifically interested in that, hence the a multivariate model might not be the way to go here. [The outcome-specific regression coefficients should be the same, regardless if you implement a multivariate model or several univariate models.]

It is not easy for me to understand what you are trying to do, because it seems that you state the goal of the analyses at different places, and I am not entirely sure about the terminology. For instance, you are describing images in categories in a factorial design. Do the factors you are referring to describe these categories?

Assuming that your main interest is in DV1, here are things that come to my mind:

  • your model formula looks OK, but I am wondering why you assume random slopes for participants, and not for items (there can be a reason, its just not clear to me from what you wrote)
  • you are using lots of post-warmup samples. This should not be necessary (except you want to go the bridge sampling approach to Bayes factors)
  • your adapt_delta and init_r suggest that it is hard to estimate this model, have you thought of also estimating item-specific discrimination parameters (bf(..., disc ~ 1 + (1 | item)), or even better bf(... (1 |i| item), disc ~ 1 + (1 |i| item))) to model correlation between item difficulty and item discrimination parameter. Complex model can be easier to estimate than simpler models if the capture important aspects of the data.

The best way to include DV2 and DV3 would be to also estimate their latent underlying variables and use those as predictors, but as far as I know this is not yet possible with brms. monotonic is good if there is a strong reason to assume that the relationship cannot be non-monotonic (e.g. u-shaped). If a non-monotonic relationship is possible, you could just enter it as an ordered variable. You can use the contrast command in R to limit how many polynomials you allow for a variable.

1 Like

Thank you very much for your response. It is much appreciated. I will go away and try some things out and get back to you if I have any further questions.

Regarding the lack of random slopes for items, this is because (as I understand it following Barr et al., 2013 -, we are using the maximal model structure that the design permits. In this case, factors (conditions) are defined by the type of item, so they cannot be in all conditions (e.g., each item is in one of X conditions). So we allow the intercept for items to vary but not the slope. That’s how I understand it, but I may be wrong. I’m new to this.

1 Like