In multilevel modeling there is a way to estimate a within-between estimator (in frequentist terms this would be estimating both fixed and random effects) for predictors (often time-varying covariates in the case of panel data). This entails including both the cluster mean, as well as the deviations from the cluster mean (i.e., the centered variable), as covariates.
It is currently possible to just use the observed cluster mean to run this model, but studies have suggested that doing so will produce bias. Using a latent mean (to account for unreliability of cluster means) to center the predictors produces better results (Hamaker and Muthen 2019, https://doi.org/10.1037/met0000239).
My question is: Is there a way to do this easily in brms?
This blog seems to show how to do this in Stan (https://quantscience.rbind.io/2017/08/01/bayesian-mlm-with-group-mean-centering/#group-mean-centering-with-lme4), but I was hoping there is already an easier way to do this using the brmsformula functions.
Please also provide the following information in addition to your question:
- Operating System: macOS Catalina 10.15.2
- brms Version: 2.10.0
I am not sure what you expect from the “latent mean”, but if I include a plain old fixed effect, that acts as a a sort of “cluster mean” that is informed by the data but does not have to equal the actual mean in the cluster, e.g.: if you have a brms formula
y ~ cluster + (1 | cluster) where cluster is categorical, you’ll have the mean for each cluster + the variance around that mean. Or am I missing something?
Hope that helps!
Hi @martinmodrak, thanks for that!
I don’t think that solves the issue however, because wouldn’t that just include the mean of the dependent variable? The “latent mean” that should be modeled in a within-between model is the mean of one or more predictor variables.
So, for instance, something like
y ~ (x-mean(x)) + mean(x) + (1 | cluster), where mean(x) is the within-cluster mean of X, for one or more X’s. The “latent mean” here is a recommendation that refers to the SEM framework, where instead of using the calculated sample mean (which is a point estimate), we can model the mean as a latent variable (which incorporates uncertainty).
Oh, I see. Then maybe using the non-linear model syntax (https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html) should let you have something like (not tested):
y ~ 1 + b1 * (x - mean_x) + b2 * mean_x + (1 | cluster),
mean_x ~ cluster,
b1 + b2 ~ 1,
nl = TRUE
(note that the model coefficients become explicit in non-linear syntax)
I am short on time, so I hope I didn’t mess up, but I think it should work (please consult the linked vignette to make sure you follow this and can double check that I am correct - I do make quite a lot of mistakes.
Best of luck!
EDIT: The above does not tie
mean_x to actual
x in any way, so you might want to add a line like
x ~ mean_x
Thanks @martinmodrak! I will try this out.