Brms: Estimating residual correlation between two variables with measurement error

I have a model that attempts to estimate the residual correlation between two traits using the brms multivariate syntax, where both traits are measured with some error. Testing with some simulated data, the model without measurement errors (or with error specified for one of the traits) samples efficiently, but fails to identify the true correlation (as expected). However, the model with measurement errors struggles to explore the target distribution for the residual correlation (as evidenced by low E-BFMI, low ESS and slow sampling), but gets much closer to the true correlation.

# simulate some instructive fake data with a true high correlation
sigma_true <- matrix(c(1,0.8,0.8,1),2,2)
n <- 500
# noise greater than signal: independent errors between the two traits
measurement_error <- 
  tibble(a_me = runif(n,0.5,3), 
         b_me = runif(n,0.5,3))
# true unknown effects
# estimates are true effects plus measurement errors    
data <- 
  as_tibble(y) %>% 
  rename(a_obs = V1, b_obs = V2) %>% 
  bind_cols(as_tibble(z) %>% 
  rename(a_true = V1, b_true = V2)) %>% 
# the model
bf_a <- bf(a_obs | mi(a_me) ~ 1)
bf_b <- bf(b_obs | mi(b_me) ~ 1)
fit <- brm(bf_a + bf_b + set_rescor(TRUE),
            prior = c(prior(normal(0, 0.25), class = Intercept, resp = aobs),
                      prior(normal(0, 0.25), class = Intercept, resp = bobs),
                      prior(exponential(2), class = sigma, resp = aobs),
                      prior(exponential(2), class = sigma, resp = bobs),
                      prior(lkj(3), class = rescor)),  
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.9, max_treedepth = 15),
            data = data, chains = 4, cores = 4, seed = 1)

Is my approach for estimating the residual correlation reasonable? What is it about having error for both outcomes that leads to poor sampling? Any thoughts would be great! Happy to move from brms to stan if neccessary.

  • Operating System: Windows 11
  • brms Version: 2.20.4

Sorry it took so long for you to get a response. Your approach is reasonable in general, but it doesn’t surprise me that it yields some sampling problems, which likely are arising from difficulties in identifying the residuals as separate from the measurement errors.

No idea if this is part of the problem, but one thing that I note is that your lkj(3) prior on the residual correlation matrix disfavors high correlations like the ones that you use in your data generating process. Do things get any better if you switch to lkj(1), which might be more consistent with your data?