Stan_gamm4 and brms unexplained difference in intercept estimates

I’ve been comparing outputs between brms and stan_gamm4. What I’ve found is that when I try and specify the same priors for both (as both use different default priors) but I’m getting different intercept estimates…
The stan_gamm4 version appears to match that found in gamm4 but the brms estimate (~15.5 vs 11.82) is much lower (see below for a MWE).


## simulate 4 term additive truth
dat <- gamSim(6,n=400,scale=2)
## GAMM4 
mod1 <- gamm4(y ~ s(x0)+s(x1), random = ~(1|fac),data=dat)
#Estimated intercept:
#[1] 15.6657

mod2 <- rstanarm::stan_gamm4(y ~ s(x0)+s(x1), random = ~(1|fac),
                             prior_intercept = normal(0,5),
                             prior_smooth = normal(0,1), 
                             prior_aux = normal(0,5))
#Estimated intercept
#[1] 15.49564


mod3 <- brms::brm(y ~ s(x0)+s(x1) + (1|fac),
                  prior = c(prior(normal(0,5), class="Intercept"),
                            prior(normal(0,1), class="sds"),
                            prior(normal(0,5), class="sd")))

#[1] 11.82925

What’s stranger is when I fit the two models using default priors I get similar intercepts…

mod4 <- rstanarm::stan_gamm4(y ~ s(x0)+s(x1), random = ~(1|fac),

mod5 <- brms::brm(y ~ s(x0)+s(x1) + (1|fac),

> mod4$coefficients[[1]]
#[1] 15.70542

#[1] 15.49703

On another note the predictions of each model are very similar… they are basically 1:1… which makes me wonder whether I’m 1) misunderstanding the print output of brms and that they are actually the same or 2) I’ve done something wrong in model specification.
Looking at the median across intercepts in the brms output

  • Operating System: OSX Catalina
  • brms Version: ‘2.13.0’
1 Like

Well when I run this the brms spits out error messages:

Warning messages:
1: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See 
2: Examine the pairs() plot to diagnose sampling problems

Are the rest of the defaults the same across rstanarm and brms? chains? iters? burnin? tree depth? adapt delta?

I’d start with making all the defaults explicit in both models.

I think that at least the default priors are not the same (even if they look the same on the surface) which might explain the differences.

1 Like

I THINK rstanarm scales data in the background and applies scaled priors but brms doesn’t. So more pulling of posterior towards prior in brms?

Yes the burnin and chains are the same. I’ve increased the adapt_delta to remove to divergent iterations and that made no difference to the intercept estimate. As far as I can see they should provide an almost identical result when specifying the priors the same in each.

However, on closer examination I think the difference is that rstanarm by default automatically rescales priors whereas brms does not

#brms prior summary
                  prior     class      coef group resp dpar nlpar bound
1                               b                                      
2                               b     sx0_1                            
3                               b     sx1_1                            
4          normal(0, 5) Intercept                                      
5          normal(0, 5)        sd                                      
6                              sd             fac                      
7                              sd Intercept   fac                      
8          normal(0, 1)       sds                                      
9                             sds     s(x0)                            
10                            sds     s(x1)                            
11         normal(0, 5)              sigma    

#rstanarm prior summary
Priors for model 'mod2' 
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 0, scale = 5)
  Adjusted prior:
    ~ normal(location = 0, scale = 27)

Auxiliary (sigma)
  Specified prior:
    ~ half-normal(location = 0, scale = 5)
  Adjusted prior:
    ~ half-normal(location = 0, scale = 27)

 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)

Turning off autoscale on the rstanarm priors seems to help with intercepts closer to each other (11.83 vs 12.4). The other difference is that it looks like rstanarm includes a covariance prior by default whereas brms does not.

1 Like

Yes it looks like that’s happening.