Why is posterior_linpred so slow?

Hi,

I fitted this model with brms on a large dataset (70 000 observations) :

model_formula <- brmsformula(hunting_success | trials(4) ~
                                        Zspeed +
                                        Zspace_covered_rate +
                                        Zprox_mid_PreyGuarding +
                                        Zhook_start_time +
                                        Zgame_duration +
                                        (1 | map_name) +
                                        (1 | player_id) +
                                        (1 | obs))
base_model <- brm(formula = model_formula,
                  family = binomial(link = "logit"),
                  warmup = 3000, 
                  iter = 11000,
                  thin = 32,
                  chains = 4, 
                  inits = "0", 
                  threads = threading(10),
                  backend = "cmdstanr",
                  seed = 123,
                  prior = priors,
                  control = list(adapt_delta = 0.95),
                  save_pars = save_pars(all = TRUE),
                  sample_prior = TRUE,
                  data = data)

The size of the output is ~1.5 Go. I want to extract the predicted values on the response scale to compute the linear trends for each fixed effects. However, I don’t understand why extracting the draws takes so much time.

For instance I ran this to have the predicted values of the model, but it’s been running for +20 minutes and it is only 30 draws:

draws <- posterior_linpred(base_model,
                            re_formula = NA,
                            transform = FALSE,
                            ndraws = 30,
                            seed = 123)

Am I doing something wrong? Is it because of the overdispersion term? I didn’t have this problem before with an earlier version of brms so I don’t understand what might be going on. Is there a way to extract the predicted values in a faster way so I can easily manipulate them myself?

This is my computer setup :
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
brms_2.16.1, Rcpp_1.0.7

Thank you very much for your help!

Maxime

1 Like

This might be related to some recent changes in brms that now delegates more stuff to the posterior package. Tagging @paul.buerkner who previously suggested that maybe the relevant code in posterior is slower than we thought.

You can always use the $fit member to access the underlying Stan fit and do whatever summaries you need directly.

Could you please confirm that reverting to an older version of brms improves the performance?

2 Likes

(This post became obsolete after the post before was deleted)

Sorry, Paul!

(Posted something before about a possible bug before realising I had a typo in my code!)

I have been having the same issue after updating brms yesterday.

After reading this thread, I ran posterior_epred on the same model using three different package versions:

  • 2.16.3 : 13 minutes 37 seconds
  • 2.15.0 : 11 minutes 57 seconds
  • 2.14.0 : 11 minutes 49 seconds

Model has an n of about 9000, and I’m predicting over a pretty hefty post-stratification frame.

Code:

pp <- posterior_epred(test.mod,
                       newdata=ps,
                       allow_new_levels=TRUE,
                       cores=6,
                       ndraws = 20,
                       seed=12345)

Scale that up and the differences start getting quick fast quite quickly, I guess?

Thank you @patrick-eng for providing a reproducible example I did not have the time in the last days! And thank you @martinmodrak for the suggestion of using $fit, I did not know that we could do that.

Maxime

Yeah, I assume this is because the extraction via posterior is not the quickest yet apparently. Do things get quicker if you install the latest dev version of posterior from github (stan-dev/posterior)?

2 Likes

That does appear to help Paul, yes! :-)

2 Likes