Brms hangs after drawing samples

Hi!

This is related to my other question about save_pars() and large multinomial models. I am trying to draw samples from a multinomial model with correlated random intercepts across categories. I can do this with smaller numbers of groups (under 1000), but with larger numbers of groups, the model samples but then hangs and never returns the fitted model object.

EDIT: I can kind of fix this by not saving the β€œrandom effects” as discussed in my other question. But there are cases when I would like to a) use the cmdstanr backend, and b) look at the group-level effects. So I’m leaving this here in case anyone has thoughts about it.

It’s a bit hard to give a reproducible example of the issue, but here is code that works.

library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.16.7). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tictoc)

# Number of groups
N <- 500
dat <- data.frame(
  id = factor(1:N),
  y1 = rbinom(N, 100, 0.1), 
  y2 = rbinom(N, 100, 0.2), 
  y3 = rbinom(N, 100, 0.3),
  y4 = rbinom(N, 100, 0.4), 
  y5 = rbinom(N, 100, 0.5), 
  y6 = rbinom(N, 100, 0.6)
)
dat$size <- with(dat, y1 + y2 + y3 + y4 + y5 + y6)

tic()
fit <- brm(
  bf(cbind(y1, y2, y3, y4, y5, y6) | trials(size) ~ 1 + (1 |p| id)), 
  family = multinomial(),
  data = dat, 
  save_pars = save_pars(group = FALSE),
  iter = 10, chains = 1,
  backend = "cmdstanr"
)
#> Start sampling
#> Running MCMC with 1 chain...
#> 
#> Chain 1 WARNING: No variance estimation is 
#> Chain 1          performed for num_warmup < 20 
#> Chain 1 Iteration: 1 / 10 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 6 / 10 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 10 / 10 [100%]  (Sampling) 
#> Chain 1 finished in 0.0 seconds.
#> 
#> Warning: 5 of 5 (100.0%) transitions ended with a divergence.
#> This may indicate insufficient exploration of the posterior distribution.
#> Possible remedies include: 
#>   * Increasing adapt_delta closer to 1 (default is 0.8) 
#>   * Reparameterizing the model (e.g. using a non-centered parameterization)
#>   * Using informative or weakly informative prior distributions
toc()
#> 13.153 sec elapsed

Created on 2022-02-08 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       macOS Monterey 12.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Asia/Seoul
#>  date     2022-02-08
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  abind            1.4-5      2016-07-21 [1] CRAN (R 4.1.0)
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.1.1)
#>  base64enc        0.1-3      2015-07-28 [1] CRAN (R 4.1.0)
#>  bayesplot        1.8.1      2021-06-14 [1] CRAN (R 4.1.0)
#>  bridgesampling   1.1-2      2021-04-16 [1] CRAN (R 4.1.0)
#>  brms           * 2.16.7     2022-02-08 [1] Github (paul-buerkner/brms@3164328)
#>  Brobdingnag      1.2-7      2022-02-03 [1] CRAN (R 4.1.1)
#>  callr            3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
#>  checkmate        2.0.0      2020-02-06 [1] CRAN (R 4.1.1)
#>  cli              3.1.1      2022-01-20 [1] CRAN (R 4.1.2)
#>  cmdstanr         0.4.0.9001 2022-01-25 [1] Github (stan-dev/cmdstanr@a2a97d9)
#>  coda             0.19-4     2020-09-30 [1] CRAN (R 4.1.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.1.2)
#>  colorspace       2.0-2      2021-06-24 [1] CRAN (R 4.1.1)
#>  colourpicker     1.1.1      2021-10-04 [1] CRAN (R 4.1.1)
#>  crayon           1.4.2      2021-10-29 [1] CRAN (R 4.1.1)
#>  crosstalk        1.2.0      2021-11-04 [1] CRAN (R 4.1.1)
#>  curl             4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.1.1)
#>  DBI              1.1.2      2021-12-20 [1] CRAN (R 4.1.1)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.1.1)
#>  distributional   0.3.0      2022-01-05 [1] CRAN (R 4.1.1)
#>  dplyr            1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
#>  DT               0.20       2021-11-15 [1] CRAN (R 4.1.1)
#>  dygraphs         1.1.1.6    2018-07-11 [1] CRAN (R 4.1.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  emmeans          1.7.2      2022-01-04 [1] CRAN (R 4.1.1)
#>  estimability     1.3        2018-02-11 [1] CRAN (R 4.1.0)
#>  evaluate         0.14       2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi            1.0.2      2022-01-14 [1] CRAN (R 4.1.1)
#>  farver           2.1.0      2021-02-28 [1] CRAN (R 4.1.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.1.1)
#>  generics         0.1.2      2022-01-31 [1] CRAN (R 4.1.1)
#>  ggplot2          3.3.5      2021-06-25 [1] CRAN (R 4.1.1)
#>  ggridges         0.5.3      2021-01-08 [1] CRAN (R 4.1.1)
#>  glue             1.6.1      2022-01-22 [1] CRAN (R 4.1.2)
#>  gridExtra        2.3        2017-09-09 [1] CRAN (R 4.1.1)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.1.1)
#>  gtools           3.9.2      2021-06-06 [1] CRAN (R 4.1.0)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools        0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
#>  htmlwidgets      1.5.4      2021-09-08 [1] CRAN (R 4.1.1)
#>  httpuv           1.6.5      2022-01-05 [1] CRAN (R 4.1.1)
#>  igraph           1.2.11     2022-01-04 [1] CRAN (R 4.1.1)
#>  inline           0.3.19     2021-05-31 [1] CRAN (R 4.1.0)
#>  jsonlite         1.7.3      2022-01-17 [1] CRAN (R 4.1.2)
#>  knitr            1.37       2021-12-16 [1] CRAN (R 4.1.1)
#>  later            1.3.0      2021-08-18 [1] CRAN (R 4.1.1)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.1.2)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
#>  loo              2.4.1      2020-12-09 [1] CRAN (R 4.1.0)
#>  magrittr         2.0.2      2022-01-26 [1] CRAN (R 4.1.1)
#>  markdown         1.1        2019-08-07 [1] CRAN (R 4.1.0)
#>  Matrix           1.4-0      2021-12-08 [1] CRAN (R 4.1.1)
#>  matrixStats      0.61.0     2021-09-17 [1] CRAN (R 4.1.1)
#>  mime             0.12       2021-09-28 [1] CRAN (R 4.1.1)
#>  miniUI           0.1.1.1    2018-05-18 [1] CRAN (R 4.1.0)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
#>  mvtnorm          1.1-3      2021-10-08 [1] CRAN (R 4.1.1)
#>  nlme             3.1-155    2022-01-13 [1] CRAN (R 4.1.1)
#>  pillar           1.7.0      2022-02-01 [1] CRAN (R 4.1.1)
#>  pkgbuild         1.3.1      2021-12-20 [1] CRAN (R 4.1.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  plyr             1.8.6      2020-03-03 [1] CRAN (R 4.1.0)
#>  posterior        1.2.0      2022-01-05 [1] CRAN (R 4.1.1)
#>  prettyunits      1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
#>  processx         3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
#>  promises         1.2.0.1    2021-02-11 [1] CRAN (R 4.1.0)
#>  ps               1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
#>  purrr            0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.1.1)
#>  Rcpp           * 1.0.8      2022-01-13 [1] CRAN (R 4.1.1)
#>  RcppParallel     5.1.5      2022-01-05 [1] CRAN (R 4.1.1)
#>  reprex           2.0.1      2021-08-05 [1] CRAN (R 4.1.1)
#>  reshape2         1.4.4      2020-04-09 [1] CRAN (R 4.1.0)
#>  rlang            1.0.1      2022-02-03 [1] CRAN (R 4.1.1)
#>  rmarkdown        2.11       2021-09-14 [1] CRAN (R 4.1.1)
#>  rsconnect        0.8.25     2021-11-19 [1] CRAN (R 4.1.1)
#>  rstan            2.26.6     2022-01-25 [1] local
#>  rstantools       2.1.1      2020-07-06 [1] CRAN (R 4.1.0)
#>  rstudioapi       0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  scales           1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.1.1)
#>  shiny            1.7.1      2021-10-02 [1] CRAN (R 4.1.1)
#>  shinyjs          2.1.0      2021-12-23 [1] CRAN (R 4.1.1)
#>  shinystan        2.5.0      2018-05-01 [1] CRAN (R 4.1.0)
#>  shinythemes      1.2.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  StanHeaders      2.26.6     2022-01-25 [1] local
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.1.1)
#>  stringr          1.4.0      2019-02-10 [1] CRAN (R 4.1.1)
#>  tensorA          0.36.2     2020-11-19 [1] CRAN (R 4.1.0)
#>  threejs          0.3.3      2020-01-21 [1] CRAN (R 4.1.0)
#>  tibble           3.1.6      2021-11-07 [1] CRAN (R 4.1.1)
#>  tictoc         * 1.0.1      2021-04-19 [1] CRAN (R 4.1.0)
#>  tidyselect       1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  V8               4.1.0      2022-02-06 [1] CRAN (R 4.1.2)
#>  vctrs            0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  withr            2.4.3      2021-11-30 [1] CRAN (R 4.1.1)
#>  xfun             0.29       2021-12-14 [1] CRAN (R 4.1.1)
#>  xtable           1.8-4      2019-04-21 [1] CRAN (R 4.1.0)
#>  xts              0.12.1     2020-09-09 [1] CRAN (R 4.1.0)
#>  yaml             2.2.2      2022-01-25 [1] CRAN (R 4.1.1)
#>  zoo              1.8-9      2021-03-09 [1] CRAN (R 4.1.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

However, running the same code as above but increasing N to, say, 5000 results in brms hanging after drawing the samples:

Compiling Stan program...
Start sampling
Running MCMC with 1 chain...

Chain 1 WARNING: No variance estimation is 
Chain 1          performed for num_warmup < 20 
Chain 1 Iteration: 1 / 10 [ 10%]  (Warmup) 
Chain 1 Iteration: 6 / 10 [ 60%]  (Sampling) 
Chain 1 Iteration: 10 / 10 [100%]  (Sampling) 
Chain 1 finished in 0.3 seconds.

Warning: 5 of 5 (100.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

I’ve tried this with rstan and cmdstanr backends, but it just sits there. I left it in the background and even after an hour it wasn’t done.

I don’t think this is a memory issue, but would appreciate any pointers! Thanks.

1 Like

I’ve run into this issue, too, but I don’t have a reproducible example just yet. In my case stan or brms hangs after drawing samples. Samples for all chains appear finished but the repl is busy and won’t return.

Which brms version are you using? After sampling has finished, brms reads the stan csv files back into R. Previously it did this via rstan::read_stan_csv(), which can be extremely slow once csv files get large. The current version (2.18.0) uses cmdstanr::read_cmdstan_csv() on the backend and doesn’t hang with large file sizes. I’m fairly confident this will have been the issue in the original post, which used 2.16.7.

If you have a model that has completed sampling but is now stuck in this end step (or timed out etc.), then- assuming you know where they got saved- you can also rebuild the brmsfit object from the stan csv files.

2 Likes

I’m getting a hang too but on v2.18.0. After a (successful) run after restarting R, the .rds filesize was 66MB. I’m using save_pars = save_pars(all = TRUE) so that I can later use the fitted object to calculate LOO, so perhaps the reason.

@Chris_Statwonk_Peters would a gc() command before model fitting help to avoid this? I had been working 10s of hours in the same R session, perhaps that’s why this intermittently happened to me?

Would be useful to add the β€˜bugs’ tag to this @matti, as the thread author for better visibility to package maintainer.

OS: 5.10.16.3-microsoft-standard-WSL2

Using gc() hasn’t helped unfortunately as it is hanging again.

@paul.buerkner: feature idea. Just like a progress readout is printed for the % progress of the chains, is it possible to track the progress of saving the fitted object & its parameters to .rds? The tricky bit for the user is determining if it’s crashed or just taking a long time to complete.

ps: .rds file expected to be ~ 106MB.

this would have to be an rstan oder cmdstanr feature as brms does not control things that happen during the actual model fitting.

1 Like

Thanks have suggested it here Feature suggestion: Printout progress when saving model fit object Β· Issue #725 Β· stan-dev/cmdstanr Β· GitHub.