Formula syntax in rstanarm differences

I am not an expert on the formula syntax for specifying models. I understand that it is always assumed that there is an intercept so that y ~ 1 + x would be equivalent to y ~ x.

I think this is the case, regardless of varying intercepts/slopes. So y ~ 1 + x + (1|g) would be the same as y ~ x + (1|g). I have, however, seen that the different syntax has a big effect on the time it takes for the model to run. The posterior summaries are the same, but the former’s chains take longer to finish.

Is this to be expected? Am I missing something?

You’ll need to be more specific. If I run the rstanarm (2.21.1) hierarchical example,

and use microbenchmark for timing, I get,

 min      lq     mean     median       uq      max neval
 2.03396 2.07657 2.336422 2.456105 2.481347 2.525721    10

when running,

stan_glmer(cbind(Hits, AB - Hits) ~ (1 | Player), data = bball,
               family = binomial("logit"),
               prior_intercept = wi_prior, seed = SEED)

and get essentially the same times,

  min       lq     mean   median       uq      max neval
 2.053491 2.092504 2.147562 2.135125 2.180287 2.324809    10

when explicitly adding the intercept 1:

fit_partialpool <-
    stan_glmer(cbind(Hits, AB - Hits) ~ 1 + (1 | Player), data = bball,
               family = binomial("logit"),
               prior_intercept = wi_prior, seed = SEED)
1 Like

Hi @fazepher, that’s definitely not expected. Like @ssp3nc3r was getting at with his example, there shouldn’t be a difference other than variation from things like the quality of random initial values.

Can you share an example where you see this behavior? That would be really interesting if I could reproduce it.