Correlated random effects

Dear community,

I would like to ask about what seems to be a problem that I am experiencing in fitting a brms model to count data. The model specification below results in a fit with a relatively low ESS (~1000-1200) given 4000 post-warmup iterations. This appears to have to do with the fact that the by-subject random slopes for c_gram are heavily correlated with each other. (R>.95). Incidentally, the random slopes also show a strong negative correlation with the fixed effect for c_gram.
Is this expected behavior under some circumstances, or did I miss something about the specification of my model?
I attach the data here acc_data.csv (587 Bytes)

fit <- brm( Nyes|trials(Ntotal) ~ 1 + c_gram + (c_gram + 1|1|subject), 
               data = df, family = binomial(),
               iter = 2000, chains = 4, cores = 4, seed = 1234  )

The code below displays the correlation coefficients.

cor_z <- posterior_samples(fit) %>% cor() %>% as.data.frame()
cor_z[lower.tri(cor_z, diag = T)] <- NA
cor_z %<>% cbind(., name_a=rownames(.))
cor_z %<>% tidyr::pivot_longer(-name_a, names_to = "name_b", values_to = "r")
cor_z %<>% subset(!is.na(r))
cor_z %<>% arrange(desc(r))

# correlation coefficients between random slopes are >0.95
head(cor_z)
# correlation coefficients between random slopes and the fixed effect are <-0.95
tail(cor_z)

I fitted the model on Linux, using R 4.0.0, brms version 2.12.9.

ESS of 1000 doesn’t seem that bad offhand. Are you getting rhat or treedepth warnings?

You could try fitting the model with a dense metric. So that’d be add control = list(metric = dense_e) to your brm call. This can help with linear correlations, though it won’t help with non-linear ones. Also make sure your rstan is up to date (2.19 should be fine).

2 Likes

I would think the behavior of high correlations is not unexpected in multilevel models and can very well happen. At least, I don’t see an immediate problem in your model.

3 Likes

Thank you. I’ll give this a try.

The ESS is indeed not too bad in this particular case. I probably should have mentioned that this is a substantially simplified version of the actual model I am fitting, which is non-linear and has a couple more parameters. The result is that in the full model, the ESS is ~10-50 with 8000 iterations, and sampling is super-slow.

1 Like

Thank you. I didn’t realize that this is expected in multilevel models. Could you possibly point me to any literature where this phenomenon is discussed?
I’d like to understand what aspect of the data determines the degree of correlation between (a) random slopes on the one hand, and (b) random slopes and the corresponding fixed effect on the other hand.