This is may or not be a question with an easy solution.

I’m impressed by the very wide range of models that can be compiled to stan using brms. My understanding of the multilevel models in brms is that the following is a reasonably general setup that describes many linear predictors:

\eta = X\beta + \sum_k Z_k u_k,

where u \sim N(0,S_k) and S_k are variance-covariance matrices (the rest is fairly standard notation for fixed and mixed effects). Once you’ve fit a model it seems fairly straightforward to extract the X, \beta, Z_k and u_k (either for the data used to fit the model or new data to predict on) by digging into the output of call to prepare_predictions. But is there any way to extract posterior draws and/or summaries of S_k? In theory one could reconstruct S_k from the output to a call to VarCorr and the formula that defines your model, but parsing the formula seems far from trivial. Given that the stan code generated doesn’t seem to ever construct the S_k directly, I’m guessing this might not be implemented anywhere, but if it is I would find it very useful!

The reason I am asking is that I want to integrate out some but not necessarily all of the random effects to get population estimates at different scales. Essentially my data is coming from nested cluster surveys; say sampling sites within villages and villages within the broader population. An example formula for the linear predictor (with custom non-linear link and family to get it to the scale of the data) might be something like this:

Result ~ Year + (1 + Year |Village) + (1|Site)

where every site is within a village and has a unique ID. If I want to integrate out all random effects to get (functions of) population-level estimates for each year, this is possible with some brmsmargins, which draws from the REs to do MC integration. But if I want to just integrate over the site-level random effects to get estimates at the villages in the original data set I don’t think I can do this with brmsmargins.

I can do this manually using fitted(newdata = newdata, allow_new_levels = TRUE,sample_new_levels = 'gaussian') to draw samples from the random effect distribution at ‘new’ sites within each village and year. If df was the original data used to fit the model I can set up new data with something like

newdata <- df %>% select(Year, Village) %>% unique %>% #select unique existing values of Year and Village
expand_grid(Site = paste0('...New',1:100)) %>% #Generate 100 'new' sites to predict at
mutate(Site = paste0(Village, Site)) # Ensure that sites are nested within villages

While this approach works, fitted.brmsfit appears very slow at generating the new samples. So slow that for simple models where I can figure out what the covariance matrices for the random effect vectors (e.g. when S_k are just block diagonal) numerical integration over a logitnormal density using integrate appears to be roughly an order of magnitude faster. However to make this a general and automatic solution I need to be able to extract S_k somehow. However I do wonder if there is some kind of speed up that can be done to make fitted.brmsfit faster at drawing from the random effects?