Multivariate model interpreting residual correlation and group correlation

I have data with an outcome measured by four different methods a different timepoints in a group of individuals. One outcome is considered gold standard and so I’m interested in how the other outcomes correlate with it after multivariate modelling.

What I’m confused about is whether or not I should include group level modelling or not, and if I do, how does that change the interpretation of the residual correlation? To illustrate I include two different modesl and output (sorry cannot share the data for reproducible example)

fit1 <- brm(
    mvbind(outA, outB, outC, outD) ~ centre + days_from_baseline + (days_from_baseline | uin) ,
    data = df2, chains = 4, cores = 4)

fit2 <- brm(
    mvbind(outA, outB, outC, outD) ~ centre + days_from_baseline + (days_from_baseline | p | uin) ,
    data = df2, chains = 4, cores = 4)

> summary(fit1)$rescor_pars
                   Estimate   Est.Error  l-95% CI  u-95% CI     Rhat Bulk_ESS Tail_ESS
rescor(outA,outB) 0.9534558 0.004181592 0.9446059 0.9609188 1.001909     2614     2993
rescor(outA,outC) 0.8686476 0.017175859 0.8324190 0.8992108 1.002068     1261     2092
rescor(outB,outC) 0.8572975 0.017809584 0.8199478 0.8880892 1.001311     1344     2242
rescor(outA,outD) 0.9070788 0.010486762 0.8848478 0.9261268 1.001920     1451     2412
rescor(outB,outD) 0.9087151 0.010564702 0.8861320 0.9277103 1.003154     1342     2752
rescor(outC,outD) 0.8413673 0.019031188 0.8006894 0.8760983 1.000012     1518     2512
> summary(fit2)$rescor_pars
                    Estimate  Est.Error     l-95% CI  u-95% CI     Rhat Bulk_ESS Tail_ESS
rescor(outA,outB) 0.45600262 0.04868140  0.358168835 0.5492583 1.000698     1669     2241
rescor(outA,outC) 0.15993739 0.06086832  0.040071641 0.2793375 1.001684     2730     3197
rescor(outB,outC) 0.12594397 0.06094139  0.005829229 0.2452957 1.002860     2169     3038
rescor(outA,outD) 0.19459390 0.06270912  0.075989452 0.3189153 1.000178     2270     2965
rescor(outB,outD) 0.27492125 0.05871985  0.155382198 0.3853291 1.001018     2042     3233
rescor(outC,outD) 0.08333154 0.06564745 -0.045100800 0.2104020 1.003709     1471     2681

I note also that fit1 gives a BulkESS warning, fit2 does not.

However, my difficulty is intepreting these coefficients. For example in fit1 outA and outB have 0.95 residual correlation conditional on the model. I understand that - these two methods are very similar and I expect alot of correlation in these metrics. For fit2 however, outA and outB have 0.46 residual correlation conditional on model2. If the (days_from_baseline | p | uin) specification has included correlation within groups across outcomes - what does this 0.46 residual correlation now represent ?

I also plot the fit for each outcome for both models:

fit1 - not sure why outcomeB is horizontal given its highly correlated with outcome A

fit2 - makes more sense

1 Like

If I’m following your description and formula syntax correctly, fit1 does not allow the group-level variance parameters to covary across the four outcome measures. fit2 allows four outcome measures to covary at both the residual variance level (rescore) and the level of the group-level variance parameters. To my mind, the fit2 approach would be more appropriate in most cases and I would be very leery of interpreting residual covariances in a model that presumed the group-level covariances across outcomes were zero. As to how one might interepret the covariances in fit2, those in rescore are within-person and those in at the group-level are between-person.


Really helpful answer thanks. That interpretation makes sense to me.
Do you know if there is an easy way to extract the group-level covariances ? I can see the parameters in the model summary but wondering is there some function to specifically extract them?

Edit ok I foun the VarCorr command that will extract these but I’m a bit confused again by how to interpret the output.

For example:

corr_results <- VarCorr(fit2)

Give me results as follows:

> corr_results$uin$cor
, , outA_Intercept

                           Estimate   Est.Error       Q2.5     Q97.5
outA_Intercept           1.00000000 0.000000000  1.0000000 1.0000000
outA_days_from_baseline  0.02742437 0.106116712 -0.1783077 0.2310534
outB_Intercept           0.98125960 0.005515709  0.9691290 0.9905385
outB_days_from_baseline  0.01162234 0.110884760 -0.2032591 0.2244707
outC_Intercept           0.57011174 0.049750090  0.4657663 0.6583139
outC_days_from_baseline -0.01993107 0.110929555 -0.2314882 0.1959276
outD_Intercept           0.79618480 0.029308754  0.7354716 0.8476083
outD_days_from_baseline  0.01651157 0.109237399 -0.1965623 0.2364029

, , outA_days_from_baseline

                          Estimate  Est.Error       Q2.5     Q97.5
outA_Intercept          0.02742437 0.10611671 -0.1783077 0.2310534
outA_days_from_baseline 1.00000000 0.00000000  1.0000000 1.0000000
outB_Intercept          0.05946653 0.10558848 -0.1497123 0.2635315
outB_days_from_baseline 0.96371618 0.02008191  0.9149148 0.9908971
outC_Intercept          0.26306508 0.10283310  0.0605102 0.4574829
outC_days_from_baseline 0.81631499 0.06104099  0.6799112 0.9171483
outD_Intercept          0.13442670 0.11131363 -0.0901118 0.3496498
outD_days_from_baseline 0.84687966 0.05648600  0.7205223 0.9391188


I get that these are results fit from the linear model specified by (days_from_baseline | p | uin) in fit2, but I’m a bit confused again as to how to interpret them.

1 Like

I don’t think I understand your model in detail, but interpreting individual coefficients can get tricky, precisely because their interaction with other parts of the model might be complicated.

I usually find it more straightforward to interpret complex models via their predictions (e.g. the results of posterior_predict or posterior_epred for suitably chosen data). E.g. if your question is “how big would be the correlation of the measurements in a new experiment”, you can just predict a set of measurements and compute the correlation individually for each posterior sample. This gives you posterior samples for the correlation, which should be easy to interpret.

Does that make sense?

1 Like

Hi Martin thanks for the answer. I’ve been thinking it over and honestly it doens’t really make sense to me on a couple of points. My question is not about predictions for new experiments it is to understand how well the measurements agree in this one with one being considered gold standard. I also don’t really understand why I would predict new data and look at correlations in that - wouldn’t that be applying a new model to predictions from the first model??!? Are there any examples of your approach you could point me to ?

Sorry I forgot to answer this. Its a multivariate linear mixed effects model with 4 outcomes including explicity group level correlation term but I get that variable names may be confusing above. Something like (Y1,Y2,Y3,Y4) ~ covar + time + (time | p | ID)
The centre covariate is actually study site so maybe merits hierarchical representation itself, but for now sticking with a fixed effect of it

Ah, I see. This stuff can get confusing, so don’t be afraid to ask for clarifications.

If you only cared about how well the measurements agree in the experiment you just did, you don’t need a statistical model - you could just compute the correlation matrix (or a similar measure) between observed A,B,C and D and be done. All there is to know about the experiment you ran has been observed. But I guess you wouldn’t be happy with that approach - and I guess that is because you actually want to generalize. The phrase “how well the measurements agree” implies that you care about some general “truth”. You care about generalization - to other centers, other times from baselines, other subjects. If you don’t care about generalization I honestly think there is little reason do statistical modelling - just describe the data you’ve seen - that’s also good (statistical model can be also a tool to just describe data, but then you IMHO actually start to play by almost the same rules as if you consider generalizations and the distinction is IMHO only of theoretical interest).

One way (but not the only one) to think about this generalizations is to imagine hypothetical new experiments/measurements. And looking at individual coefficients can be seen as a special case of this. If I have a simple linear regression y ~ x and inspect the fitted coefficient for a binary predictor x, then I am looking at how big difference between averages of y for the two groups I would see in a hypothetical new set of measurements with no noise/unlimited data. However, for more complex models the individual coefficients can correspond to quite bizarre hypothetical quantities.

But the framework of hypothetical future experiments is helpful because it let’s us be explicit about the details of what you care about - I wrote about this at Inferences marginal of random effects in multi-level models so I won’t repeat it here.

As a (slightly contrived) example of how to use predictions and one reason why it can matter consider this regression:

d <- data.frame(x = c(3,5))
fit <- brm(x ~ 1, data = d)

The summary of the parameters is:

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.96      1.25     1.36     6.46 1.00     1164     1313

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.13      1.51     0.65     6.06 1.00     1242     1257

But that’s hiding an important part of the picture - the estimates for Intercept and sigma are not idependent - the data is consistent with the Intercept around 4 and small sigma but also with large sigma and the Intercept in a wide range.


For some inferences, this doesn’t really matter. I can interpret the Intercept as the mean of a hypothetical future measurement with unlimited data. Let’s compare this with prediction for a “big” experiment:

dummy_data <- data.frame(.dummy = rep(1, length.out = 1000)) # Data frame with 1000 rows, to predict 1000 observations, normally this would house covariates
pred <- posterior_predict(fit, newdata = dummy_data)
samples_mean <- rowMeans(pred)
# 3.961717
quantile(samples_mean, c(0.025,0.975))
#     2.5%    97.5% 
# 1.345227 6.462462 

Yay, we recovered the summary for Intercept. But what if we care just about a single new value? How big could it get? So the Intercept is likely not bigger than 6.46, but the sigma can plausibly get to 6.06. How likely is that both are large? There is no straightforward way to see that from the parameter summaries, but we can use predictions.

samples_single <- posterior_predict(fit, newdata = data.frame(.dummy = 1)) 
# [1] 3.962272
quantile(samples_single, c(0.025,0.975))
#      2.5%     97.5% 
# -1.663526  9.813133 

This dependency between parameters is usually not that strong in practice, but it is a similar issue to the one you are facing. Here, there are two sources of variability that cannot be fully understood separately. In your case, there are two sources of correlation and considering them separately is weird as you noted. The easiest way to consider them together is to use predictions.

Hope that clarifies more than confuses.

1 Like

Great answer @martinmodrak - thanks very much for the time you put into that. I’ll take some time to go through it

1 Like

Hi @martinmodrak in your example there you sample from the posterior with single newdata point with same value as all the original datapoints. But to generalise to my data and model which is complex, when you are sampling the posterior would you simply use the original data to fit the models.

i.e. are you suggesting something like this:

fit2 <- brm(
    mvbind(outA, outB, outC, outD) ~ centre + days_from_baseline + (days_from_baseline | p | uin) ,
    data = df2, chains = 4, cores = 4)

samples_multiple <- posterior_predict(fit2, newdata = df2) 
predA <- samples_multiple[,,1]
predB <- samples_multiple[,,2]
predC <- samples_multiple[,,3]
predD <- samples_multiple[,,4]

meanA <- apply(predA, 2, mean)
meanB <- apply(predB, 2, mean)
meanC <- apply(predC, 2, mean)
meanD <- apply(predD, 2, mean)

cor(meanA, meanB)
cor(meanA, meanC)
cor(meanA, meanD)
cor(meanB, meanC)
cor(meanB, meanD)
cor(meanC, meanD)

(I’m sure theres neater ways to do that post-processing)

This is where it gets interesting - there is no general rule and you need to be explicit about which predictions do you care about. Using the original data would mean you are interested in something like exact replications of the same experiment (with the same subjects etc.). But there are other sensible approaches - e.g. assume a hypothetical previously unseen subject (see the allow_new_levels argument for posterior_predict) at some pre-specified days from baseline that interest you. There really are a lot of options and I think the advantage of this approach is that it makes you be explicit about choosing one of those options, instead of just taking the default.

Also the correlation can really only be easily computed for predicted data (there is no easy way to get “mean correlation” or something) so the results will change depending on the size of the dataset you predict for (I would generally just use many datapoints to get the correlation with as little noise as possible).

In your code you compute the correlation of the means, but I think you instead want to compute the correlations for each sample, something like (not tested, just a sketch):

cor_matrix_samples <- array(NA_real, c(n_samples, 4, 4))
for(s in 1:n_samples) {
  cor_matrix_samples[s, ,] <- cor(samples_multiple[s,])

cor_a_b_samples <- cor_matrix_samples[,1,2]
#Credible interval
quantile(cor_a_b_samples, prob = c(0.025, 0.975)

Does that make sense?

1 Like

Hi @martinmodrak. Yes thanks that does make sense -I figured I made a mistake averaging the correlations across all samples like that 👍 Thanks for the code I will be able to adapt it.
I’ll have to think about what are useful prediction conditions so!