Rstanarm: extracting variance components


#1
  • Operating System: Windows 7
  • rstanarm Version: 2.17.4

I’m an ecologist and often conduct repeated measures experiments on animals to look at individual differences in behaviour and metabolism. I’m keen to start using rstanarm to compute repeatability estimates (intraclass correlation coefficients) and slope-intercept correlations for various traits that I look at. However, I am having a hard time figuring out how to extract variance components from rstanarm objects. I’ve pasted examples below of how I do this with MCMCglmm (full script and data here: https://bit.ly/2LTCGBi) but would greatly appreciate advice on how to do the same with rstanarm. I know that print() returns the random effects slope-intercept correlation coefficient but there are no SD estimates or 95% credibility intervals associated with it.

Thanks!

  1. calculating repeatability (ICC)

#calculate agreement repeatability using the posterior distributions [R = Vind / (Vind + Ve)], where Vind = inter-individual variance and Ve = residual variance
#$VCV is the output of the model corresponding to random effect variances
R.MAS<-MAS$VCV[,“ID”]/(MAS$VCV[,“ID”]+MAS$VCV[,“units”])
#get estimates of the marginal parameter modes (i.e. repeatability estimate) and create Highest Posterior Density (HPD) intervals for the parameter estimates (i.e. 95% CI interval)
​posterior.mode(R.MAS)
HPDinterval(R.MAS)

  1. slope-intercept correlation

#we can calculate the correlation between the random intercepts and slopes by dividing the covariance by the square root of the product of the variances
R.int.slp<-RS.MAS$VCV[,“TRIALz:(Intercept).ID”]/sqrt((RS.MAS$VCV[,"(Intercept):(Intercept).ID"]*RS.MAS$VCV[,“TRIALz:TRIALz.ID”]))
​posterior.mode(R.int.slp)
HPDinterval(R.int.slp)


Marginal and Conditional R2 for Stan's GLMMs
#2

For something like this you want to call as.matrix() or as.data.frame() to get the raw draws. Like

library(rstanarm)
post <- stan_lmer(Reaction ~ Days + (Days | Subject), lme4::sleepstudy)
df <- as.data.frame(post)
Vind <- df$`Sigma[Subject:(Intercept),(Intercept)]`
R.MAS <- Vind / (Vind + df$sigma^2)
R.int.slp <- df$`Sigma[Subject:Days,(Intercept)]` / 
             sqrt(Vind * df$`Sigma[Subject:Days,Days]`)

But don’t waste your time trying to calculate the posterior mode of anything.


#3

Thanks very much for the speedy answer, Ben! The example code works like a charm.

When running a generalized model (e.g. stan_glmer.nb), what column in ‘df’ corresponds to the residual variance (i.e. ‘sigma’)? Is it ‘reciprocal_dispersion’?

Also, why do you advise against calculating the mode and CIs? How would you report this estimate instead?


#4

Outside of a Gaussian, linear, no link function model, the idea of a decomposition of variance does not make sense. The general, correct, and Bayesian way to do things is

PPD <- posterior_predict(post)
vars <- apply(PPD, MARGIN = 1, FUN = var)

PPD_0 <- posterior_predict(post, re.form = ~ 0)
vars_0 <- apply(PPD_0, MARGIN = 1, FUN = var)

summary(vars - vars_0)

In general, it is better to just show a plot of the entire margin of the posterior distribution that is of interest, but if you are going to focus on a number it should usually be the mean or median. The posterior mode is not useful.


#5

I guess my confusion comes from the fact that several authors propose methods for variance decomposition from frequentist GLMMs. For example, see:

Nakagawa et al (2017) http://rsif.royalsocietypublishing.org/content/14/134/20170213
Johnson (2014) https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12225
Nakagawa and Schielzeth (2013) https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/j.2041-210x.2012.00261.x

I understand that extracting variance components from GLMMs is not straightforward and that there is no consensus on how to do this. Is you opinion that even though it can be done, it should not? I’d be grateful if you could expand on why ‘it makes no sense’. Many thanks-


#6

If it is not a linear model, then there is no clean decomposition. Frequentists have to do stuff like use the delta method, but if you use Stan, then you can just generate posterior predictions from the model and see how the variance across the data changes when you include or nullify a group-specific term.


#7

Would you say that the ratio between vars_0 and vars is conceptionally comparable to an ICC (Vind / (Vind + df$sigma^2))?

Also, I have read that the ICC is not meaningful for random-slope-intercept models, because the ICC would then vary by the values of the random slope (Goldstein H, Browne W, Rasbash J. 2010. Partitioning Variation in Multilevel Models. Understanding Statistics, 1:4, 223-231 (doi: 10.1207/S15328031US0104_02)), so there’s no “unique” ICC. Is this also a matter for Bayesian models, respectively for the approch to calculate the posterior predictive distributions to “decompose” the variance?


#8

Yes to both


#9

May I ask a follow-up question: How would you deal with, e.g., nested models? Like:

stan_lmer(Reaction ~ Days + (1 | group / subgroup ), sleepstudy)

Would you still use PPD <- posterior_predict(post) for the posterior predictions?

And would be possible, probably using the re.form-argument, to extract the variances for the different levels, like:

PPD1 <- posterior_predict(post, re.form =  ~ group)
PPD2 <- posterior_predict(post, re.form =  ~ group:subgroup)

#10

I can’t remember if that is the exact syntax, but something like that.