Clustered residuals for multivariate multilevel model

Snijders & Bosker (2012, chapter 16) suggest using the following syntax in nlme for fitting a multivariate multilevel model (data & scripts for the chapter can be found here)

library(nlme)
mlb161 <- lme(response ~ - 1 + whichpart, random = ~ -1 + whichpart|schoolnr,
              weights=varIdent(form=~1|whichpart),
              corr=corSymm(form=~as.numeric(whichpart)|schoolnr/pupilNR_new),
              data=mlbook_disaggregated, method="ML",
              control = list(maxIter=500, msMaxIter=500, tolerance=1e-8,
                             niterEM=250))

This assumes that the residuas are clustered within pupilNR_new (which represents students nested in schools, i.e., schoolnr): corr=corSymm(form=~as.numeric(whichpart)|schoolnr/pupilNR_new)

Taking this model to a Bayesian framework and fitting it with brms, I would like to know what would be the equivalent of this type of residuals clustering. I understand that the autocor or rescor arguments should be relevant for defining it, but I’m not sure how it could be specified in an homologous way as defined with the nlme::lme() example using corSymm().

In sum, taking aside the change from a frequentist to a bayesian perspective, what would be the proper way to specify this multivariate multilevel model in nlme with brms? (I’ve also made this question in Stack Overflow with no luck)

References
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed). Sage.

1 Like

Hi,
this is a bit hard to answer for me (and possibly others), because I am not familiar with nlme and unfortunately the vocabulary around multilevel models is often confusing and ambiguous. Would you be able to describe the desired model (or the correlation/residual part) as math formulas or as code that would simulate fake data? I.e. assuming \mu_i is the linear predictor for i-th observation in the data, w_i, s_i and p_i are the indices of whichpart, schoolnr and pupilNR_new for this observation. What would be the joint distribution of the observed data?

Hi Martin,

Thank you for replying. I am not totally sure how would the equation look like, but I have adapted the example with nlme following a simplified version of the first example in vignette("brms_multivariate"). Given the following model

library(brms)
data("BTdata", package = "MCMCglmm")

fit1b <- brm(
  mvbind(tarsus, back) ~ sex + hatchdate + (1|p|fosternest),
  data = BTdata, chains = 2, cores = 2
)

I understand that by default the residuals are correlated. Thus, one gets a correlation between tarsus and back residuals.

VarCorr(fit1b)$residual__$cor[2, 1, 1]
#> [1] -0.08842185

By using corr = corSymm() in nlme, this residuals correlation can be estimated taking into account the clustered structure of the responses. In this example, this would imply clustering this correlation by animal. So in nlme the following model would be used:

library(nlme)
library(tidyr)

BTdata_long <-  pivot_longer(
  BTdata, c(tarsus, back),
  names_to = "response_var",
  values_to = "response")

BTdata_long$response_var <- as.factor(BTdata_long$response_var)
fit2 <- lme(response ~ -1 + response_var + response_var:(sex + hatchdate),
            random = ~ -1 + response_var | fosternest,
            weights = varIdent(form = ~ 1 | response_var),
            corr = corSymm(form = ~as.numeric(response_var) | fosternest / animal),
            data=BTdata_long,
            method="ML",
            control = list(maxIter=500, msMaxIter=500, tolerance=1e-8,
                           niterEM=250))

Note that by using corSymm(form = ~as.numeric(response_var) | fosternest / animal) this residuals correlation consider the clustering of responses in animal (which are consequently clustered in fosternest). In this case, the correlation of the residuals is very similar to the one obtained with brms

fit2$modelStruct$corStruct
#> Correlation structure of class corSymm representing
#>  Correlation:
#>   1     
#> 2 -0.089

Nonetheless, I am not sure if brms considers this clustering of responses when estimating the residuals correlation and, if not, if it would be possible to do. I would appreciate any orientation about this.

I think brms actually has you covered: check out https://rdrr.io/github/paul-buerkner/brms/man/cosy.html would that be what you are looking for? There are other structures mentioned at https://rdrr.io/github/paul-buerkner/brms/man/autocor-terms.html

Also note that in brms you can explicitly predict the standard deviation (https://cran.r-project.org/web/packages/brms/vignettes/brms_distreg.html) which should IMHO serve some of the same goals as you have (e.g. having sigma ~ 1 | fosternest / animal).

Best of luck and feel free to ask for clarifications!

1 Like

Thanks, your answer has gotten me closer to understanding how brms work for modeling the residuals. Would you say then that by setting rescor = TRUE we get the residuals correlation of responses (which would be represented by animal in this example)?

1 Like

Yes, since animals form the fundamental unit of observations in this data set.

2 Likes