Difference between VarCorr and correlation implied by posterior_linpred

martinmodrak · January 26, 2021, 11:12am

The results of VarCorr should AFAIK be the fitted covariance/correlation matrix for group-level effects. In a model where those effects are the only term, one would expect to get exactly the same covariance/correlation structure by computing the linear predictor for a large number of datapoints for each sample and computing the empirical correlation in each sample (i.e. the predictors should be exactly draws from the relevant multivariate normal, so taking many such draws should recover the correlation covariance). This however gives different numbers in practice.

So maybe I am missing something? Or do I have a bug?

Code to reproduce this:

library(brms)
library(tidyverse)
options(mc.cores = parallel::detectCores(), brms.backend = "cmdstanr")

set.seed(58232134)
N_ids <- 20
N_groups <- 3
test_data <- crossing(data.frame(id = paste0("id",1:N_ids), idmod = rnorm(N_ids)), 
                      data.frame(group = paste0("g", 1:N_groups), groupmod = rnorm(N_groups)),
                      data.frame(rep = c(1:3))) %>%
  mutate(response = rnorm(n(), groupmod + idmod))

fit <- brm(response ~ 0 + (0 + group | id), data = test_data)

VarCorr(fit)$id$cor[,"Estimate", ]

Gives:

          groupg1   groupg2   groupg3
groupg1 1.0000000 0.3642701 0.9348216
groupg2 0.3642701 1.0000000 0.4477652
groupg3 0.9348216 0.4477652 1.0000000

Now, computing via the linear predictor:

all_groups <- unique(test_data$group)
n_reps <- 10000
prediction_data <- crossing(id = paste0("new_id", 1:n_reps), group = all_groups)
linpred <- posterior_linpred(fit, newdata = prediction_data, allow_new_levels = TRUE)

n_samples <- dim(linpred)[1]
per_sample_cor <- array(NA_real_, c(n_samples, length(all_groups), length(all_groups)  ))
for(s in 1:n_samples) {
  predictors_wide <- array(NA_real_, c(n_reps, length(all_groups)))
  for(g in 1:length(all_groups)) {
    predictors_wide[ , g] <- linpred[s, prediction_data$group == all_groups[g]]
  }
  per_sample_cor[s, , ] <- cor(predictors_wide)
}

apply(per_sample_cor, MARGIN = c(2,3), FUN = mean)

which gives:

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.4954658 0.8642596
[2,] 0.4954658 1.0000000 0.6006337
[3,] 0.8642596 0.6006337 1.0000000

Any ideas, what is causing the difference?

torkar · February 9, 2021, 8:29am

Martin, did you get to the bottom of this? It sounds very suspicious…

martinmodrak · February 11, 2021, 3:07pm

I unfortunately still have no idea…

Tagging @paul.buerkner for additional insights…

paul.buerkner · February 11, 2021, 10:08pm

I will take a closer look next week :-)

Topic		Replies	Views
Using VarCorr to calculate correlation matrix brms	1	699	May 17, 2020
Correlations/covariances among predictor and criterion variables in brms Modeling brms	2	356	October 26, 2022
Posterior Predictive Distribution in Bivariate Meta-Analysis Modeling brms , meta-analysis	8	1635	February 4, 2022
Predicting the variance, taking the mean into account brms posterior-predictive , brms	7	118	October 21, 2024
Multivariate model interpreting residual correlation and group correlation brms interpret-results	14	3862	September 12, 2020

Difference between VarCorr and correlation implied by posterior_linpred

Related topics