Honey, I shrunk the random slopes (too much?)

TL’DR: Why are my random slopes so small when modelled as shown below?

Long version: For a study, I am modelling how a far away human participants place an object from it’s correct location (i.e. my outcome is Euclidean distance) and my predictor is the number of times my participants have seen/placed that object (scaled to SD = 0.5 and mean of 0) with the obvious expectation that people will get better over time.

Model: euclideanDistance ~ s_times + (s_times | subject), family = Gamma(link = “log”)

Priors:

OLM_7T_m2_prior  <- c(prior(normal(4.26123, 0.5), class = "Intercept", lb = 0, ub = log(153)),
                      prior(skew_normal(0.15, 0.5, -5), class = "b"),
                      prior(gamma(1, 1), class = "shape"),
                      prior(student_t(5, 0, 0.5), class = "sd"),
                      prior(student_t(5, 0, 0.5), class = "sd", coef = "s_times", group = "subject"))

When looking at the raw data it is obvious that on average the placement error definitely decreases but between participants there seems to be quite a lot of variation. However, in my bmrs model there are only tiny differences between participants with regard to the slope. However, I want to use the estimated slopes as predictors for different stuff but the magnitude of the slopes feels misleading (even though statistically they might just be scaled down; see scatter plot).

I tried various things and I feel that there is something missing in my understanding how this models works that might be obvoius to some forum user here so I am trying my luck. The priors were chosen based in prior predictice checks but even if I for instance change the with of the random effects parameters (sd())it doesn’t really change.

Here is a direct comparison between similar models estimated via glmer and brm. Obviously, they can give different answers but I wasn’t expceting differences of this magnitude especially since the fixed effect differences are tiny.

You can see that for glmer (Min = -0.360 & Max = -0.062) some participants nearly have a slope of 0 while with brm they’re essentially uniform (Min = -0.263 & Max = -0.168). Most of this is simply scaling as can be seen when I compare both random slope estimates directly

So my question what in my model or prior structure is causing the slopes to be so small?

Summary for brm model:

summary(OLM_7T_m2)
# Family: gamma 
# Links: mu = log; shape = identity 
# Formula: euclideanDistance ~ s_times + (s_times | subject) 
# Data: OLM_7T_retrieval (Number of observations: 2016) 
# Draws: 10 chains, each with iter = 30000; warmup = 5000; thin = 1;
# total post-warmup draws = 250000
# 
# Group-Level Effects: 
#   ~subject (Number of levels: 56) 
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)              0.39      0.04     0.32     0.48 1.00    46992    79373
# sd(s_times)                0.05      0.04     0.00     0.14 1.00    76693   110916
# cor(Intercept,s_times)     0.21      0.46    -0.82     0.94 1.00   241604   138982
# 
# Population-Level Effects: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     3.43      0.05     3.32     3.53 1.00    32040    61974
# s_times      -0.22      0.03    -0.28    -0.17 1.00   293887   186542
# 
# Family Specific Parameters: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# shape     2.58      0.08     2.43     2.74 1.00   321954   185870
# 
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

Summary for glmer model:

summary(OLM_7T_m2_freq)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: Gamma  ( log )
# Formula: euclideanDistance ~ s_times + (s_times | subject)
# Data: OLM_7T_retrieval
# 
# AIC      BIC   logLik deviance df.resid 
# 17126.0  17159.6  -8557.0  17114.0     2010 
# 
# Scaled residuals: 
#   Min      1Q  Median      3Q     Max 
# -1.6217 -0.7049 -0.1389  0.4873  5.9048 
# 
# Random effects:
#   Groups   Name        Variance Std.Dev. Corr
# subject  (Intercept) 0.07093  0.2663       
# s_times     0.01454  0.1206   0.12
# Residual             0.37196  0.6099       
# Number of obs: 2016, groups:  subject, 56
# 
# Fixed effects:
#   Estimate Std. Error t value Pr(>|z|)    
# (Intercept)  3.40418    0.05678  59.949  < 2e-16 ***
#   s_times     -0.22684    0.03473  -6.531 6.53e-11 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Correlation of Fixed Effects:
#   (Intr)
# s_times 0.086 

Code for the models

# Brms priors
OLM_7T_m2_prior  <- c(prior(normal(4.26123, 0.5), class = "Intercept", lb = 0, ub = log(153)),
                      prior(skew_normal(0.15, 0.5, -5), class = "b"),
                      prior(gamma(1, 1), class = "shape"),
                      prior(student_t(5, 0, 0.5), class = "sd"),
                      prior(student_t(5, 0, 0.5), class = "sd", coef = "s_times", group = "subject"))

# Brms model 
OLM_7T_m2 <- brm(euclideanDistance  ~ s_times + (s_times | subject), 
                 family = Gamma(link = "log"),
                 data = OLM_7T_retrieval,
                 prior = OLM_7T_m2_prior, 
                 iter = iter,
                 warmup = warmup,
                 chains = chains,
                 cores = cores,
                 seed = 1124,
                 control = list(adapt_delta = 0.9),
                 backend = "cmdstanr")

# glmer model
OLM_7T_m2_freq <- glmer(euclideanDistance  ~ s_times + (s_times | subject), 
                        family = Gamma(link = "log"),
                        data = OLM_7T_retrieval)
1 Like

So just to confirm, you did try something super wide like half-Cauchy and it didn’t change the estimates of varying slopes in the brms model?

Also, I don’t work much with Gamma family, but is there a shape parameter for the glmer model that I am missing in the output? Do lme4 and brms have the same implementation of Gamma? Have you seen this Reliability of glmer to fit Gamma-distributed model - Strange Ranef Variance · Issue #643 · lme4/lme4 · GitHub ? It might be worth hunting around for differences in the implementation of Gamma models for lme4 vs brms. Sorry to not be more help!

1 Like

Can you show your plotting code for generating and plotting the predicted values from brms?

I used library(sjplot) for those plots.

p1 <- plot_model(OLM_7T_m2,type="pred", terms=c("s_times", "subject"),pred.type="re", se = FALSE, ci.lvl = 0.001)
p1 <- p1 + theme(legend.position = "none") + labs(title = "Predicted values based on brm") + scale_color_viridis_d()

p2 <- plot_model(OLM_7T_m2_freq,type="pred", terms=c("s_times", "subject"),pred.type="re", se = FALSE, ci.lvl = 0.001)
p2 <- p2 + theme(legend.position = "none") + labs(title = "Predicted values based on glmer") + scale_color_viridis_d()

Apart from possible differences in the way these predictions are generated, the range of the estimates slopes are very different. In brms the ‘weakest’ slope is still very negative with -0.168 vs. -0.0622 for glmer. But even if I just look at the raw data I can see a small number of people that nomianlly are worse at the end, which is not compatible with the weakest slope being -0.168.

I will check that next. Thanks!

When I look at the output of your brms summary and glmer summary, it looks to me like the brms model has fit a slope/intercept correlation parameter whereas the glmer one has not. I might just be missing something though.

No, the output is misaligned.

# Random effects:
#   Groups   Name        Variance Std.Dev. Corr
# subject  (Intercept) __0.07093  0.2663       
# s_times     ___________0.01454  0.1206   0.12
# Residual             __0.37196  0.6099       
# Number of obs: 2016, groups:  subject, 56
2 Likes

Isn’t the spread of the random effects only part of the problem? Assuming that the random intercepts and slopes are centered at 0 in both cases, the mean structure E(distance | times) is different too. One of your models may fit the data better. Perhaps start by checking which one?

The difference in E(dist | times) is a bit hard to notice with so many individual curves, so I’ve plotted:

dist.glmer <- function(times) {
  exp(3.40418 - 0.22684 * times)
}
dist.brm <- function(times) {
  exp(3.43 - 0.22 * times)
}
1 Like

Hey yes, in one model I just used the default priors and I get a very similar results. I think only the fixed effects differ about 0.01 if I didn’t miss anything so it actually reassuring that it is not using some weird priors I guess.

# > OLM_7T_m2$prior$prior
# [1] ""                       ""                       "student_t(3, 3.3, 2.5)" "lkj_corr_cholesky(1)"   ""                       "student_t(3, 0, 2.5)"   ""                       ""                      
# [9] ""                       "gamma(0.01, 0.01)"
# Only default priors
# > summary(OLM_7T_m2)
# Family: gamma 
# Links: mu = log; shape = identity 
# Formula: euclideanDistance ~ s_times + (s_times | subject) 
# Data: OLM_7T_retrieval (Number of observations: 2016) 
# Draws: 10 chains, each with iter = 30000; warmup = 5000; thin = 1;
# total post-warmup draws = 250000
# 
# Group-Level Effects: 
#   ~subject (Number of levels: 56) 
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)              0.39      0.04     0.32     0.48 1.00    48131    81957
# sd(s_times)                0.05      0.04     0.00     0.14 1.00    77912   109559
# cor(Intercept,s_times)     0.21      0.46    -0.82     0.94 1.00   250812   140814
# 
# Population-Level Effects: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     3.42      0.05     3.31     3.52 1.00    32629    62898
# s_times      -0.23      0.03    -0.28    -0.17 1.00   315153   191713
# 
# Family Specific Parameters: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# shape     2.59      0.08     2.44     2.74 1.00   341058   182994
# 
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

Oh this is true especially since we’re talking about log transformed estimates here right?

Eyeballing it via looking at the raw data (averaged across the s_times variable for visualisation)

# "Raw" data plots
p1 <- ggplot(OLM_7T_retrieval_agg, aes(x = s_times, y = euclideanDistance, colour = subject)) +
  geom_line() +
  theme(legend.position = "none") + 
  labs(title = "Average response over time") + scale_color_viridis_d()

I feel that the glmer actually makes more sense? Because you can clearly see some cases where there is barely any difference or even getting slightly worse over time (see red circle) but the brm model fits of the random slopes do not seem to show this at all. Again I expect some form of regularisation because of the hierachical nature and the priors but I didn’t expect all the slopes of my brm model to be essentially be parallel.

I’m not sure how to read “average response over time”. Is this an average per participant per stimes? But how can a participant see & place the object for the $k$th time repeatedly. Unless there are multiple objects? (And in that case the model assumes a participant’s learning curve is the same for all objects.) I also don’t understand the transformation of stimes; it ends up with three time points (per participant) arranged about -0.5 and another three about 0.5, with a gap of no observations in between. Why not work with the actual number of times?

None of this explains why brm predicts less variability in the random slopes than glmer. Figuring this out may require more experiments with the priors and making more plots of the brm parameters and predictions.

Thanks you for your response these are all valid points. In short, there are multiple objects so each subject has more than one event where they see something for the first, second time etc. I transfomed the variable because otherwise I couldn’t get rid of divergent transitions.

But yes, I also agree that this probably doesn’t explain what I am seeing here. Unfortunately, I haven’t really made much progress why there is so little varibility but my current suspicion is that this is due to the likelihood function (Gamma) and its implementation in brms because when I use the Gaussian distribution I get very similar results with regard to the random slopes.

Not sure what to do next…

Here is the same plot for two Gaussian models:


OLM_7T_m2 <- brm(euclideanDistance  ~ s_times + (s_times | subject), 
                 #family = Gamma(link = "log"),
                 data = OLM_7T_retrieval,
                 #prior = OLM_7T_m2_prior, 
                 iter = iter,
                 warmup = warmup,
                 chains = chains,
                 cores = cores,
                 seed = 1124,
                 control = list(adapt_delta = 0.9),
                 backend = "cmdstanr")
# Family: gaussian 
# Links: mu = identity; sigma = identity 
# Formula: euclideanDistance ~ s_times + (s_times | subject) 
# Data: OLM_7T_retrieval (Number of observations: 2016) 
# Draws: 10 chains, each with iter = 30000; warmup = 5000; thin = 1;
# total post-warmup draws = 250000
# 
# Group-Level Effects: 
#   ~subject (Number of levels: 56) 
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)             12.52      1.31    10.24    15.36 1.00    49009    83205
# sd(s_times)                3.64      1.58     0.53     6.68 1.00    56402    66253
# cor(Intercept,s_times)    -0.42      0.29    -0.93     0.19 1.00   156721   104927
# 
# Population-Level Effects: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    32.75      1.74    29.31    36.17 1.00    31226    59412
# s_times      -7.09      1.04    -9.12    -5.05 1.00   197417   176223
# 
# Family Specific Parameters: 
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma    20.03      0.32    19.40    20.68 1.00   288036   183173

OLM_7T_m2_freq <- lmer(euclideanDistance  ~ s_times + (s_times | subject), 
                       #family = Gamma(link = "log"),
                      data = OLM_7T_retrieval)
# Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
# Formula: euclideanDistance ~ s_times + (s_times | subject)
#    Data: OLM_7T_retrieval
# 
# REML criterion at convergence: 17957.5
# 
# Scaled residuals: 
#     Min      1Q  Median      3Q     Max 
# -2.8623 -0.6305 -0.1348  0.4407  4.9321 
# 
# Random effects:
#  Groups   Name        Variance Std.Dev. Corr 
#  subject  (Intercept) 149.77   12.238        
#           s_times      17.16    4.142   -0.43
#  Residual             399.82   19.996        
# Number of obs: 2016, groups:  subject, 56
# 
# Fixed effects:
#             Estimate Std. Error     df t value Pr(>|t|)    
# (Intercept)   32.811      1.695 55.000   19.36  < 2e-16 ***
# s_times       -7.100      1.049 55.001   -6.77 8.99e-09 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Correlation of Fixed Effects:
#         (Intr)
# s_times -0.219

Here is the code for the plot:

library(sjPlot)
library(ggplot2)
library(cowplot)

# Compare random slopes frpom models
p1 <- plot_model(OLM_7T_m2,type="pred", terms=c("s_times", "subject"),pred.type="re", se = FALSE, ci.lvl = 0.001)
p1 <- p1 + theme(legend.position = "none") + labs(title = "Predicted values based on brm") + scale_color_viridis_d()

p2 <- plot_model(OLM_7T_m2_freq,type="pred", terms=c("s_times", "subject"),pred.type="re", se = FALSE, ci.lvl = 0.001)
p2 <- p2 + theme(legend.position = "none") + labs(title = "Predicted values based on glmer") + scale_color_viridis_d()

# Add plots together
p3 <- plot_grid(p1, p2, ncol = 2)

If there are multiple objects & the same object is handled by multiple subjects, doesn’t this call for random object effects? Though expanding the model won’t necessarily make it easier to fit.

1 Like