Warning after training model with brms

Hi All,

I encountered a warning message from brms after training a simple binomial model with “meanfield” as the algorithm. Here is the warning message:

In names(x$fit@sim$samples[[i]])[change$pos] ← change$fnames :
** number of items to replace is not a multiple of replacement length**
The problem is not about the warning message, but rather, the impact when I closely look into the model object return by brms and also the predictions made using the same model - the results are completely wrong.

I also realized that others have encountered the same problem in the past, but I’m not sure if it was resolved or not. See the links below for more information:
https://github.com/paul-buerkner/brms/issues/226
https://github.com/paul-buerkner/brms/issues/387
https://discourse.mc-stan.org/t/rstan-meanfield-produces-a-parameter-called-lp-1/3614

I also realized that the same model when trained on different computer with the same specs and model definition returns the expected results. Is there a way to fix this issue?

Thanks!!

  • Operating System: Ubuntu
  • brms Version: 2.90

This warning indicates a bug in brms, but it may be unrelated to the issues you linked to (which have been resolved) since it may happen for a completely different model type. The warning basically tells you that renaming some of the parameters has failed for some reason.

Do both computers have the latest brms version? In order to fix this, can you please provide a minimal reproducible example?

Paul

1 Like

Hi @paul.buerkner,

Thanks for the quick response.

Regarding your questions, No, the one with the warning message has brms version 2.9.0 whilst the one that works fine has brms version 2.8.9. I have include the model (but am not sure if that is what you are looking for). In the model, “day” is days of the week (Mon - Sun).

m <- brms::brm(
    formula = Alive|trials(Total) + weights(weight) ~ 1 + 
      day + (1|day*location),
    data = d,
    family = stats::binomial(link = 'logit'),
    cores = 4,
    save_ranef = TRUE,
    future = TRUE,
    seed = 111
    iter = 50000,
    algorithm = "meanfield",
    control = list(adapt_delta = 0.999, max_treedepth = 15
    )
  )

Thanks for the code. Two comments.

(1) For an example to be minimally reproducible it needs to run out of the box just by copying the code, so we not only need the model code but also some (possibly fake) data. Otherwise, I may not see the error just because it is caused by specifics in the data that I didn’t have. I created some fake data by myself this time, but in generally it would be good if you also included it in your code

(2) I recommend not to use ADVI (meanfield and fullrank) right now especially for multilevel models as the results are likely to be very far off. I hope we get bette diagnostics implemented in Stan soon and also some improvements to the algorithms themselves.

With the fake data I created it works for me. Possibly you use an old version of rstan?

Here is my code:

library(brms)

d <- data.frame(
  Alive = rbinom(100, size = 10, prob = 0.5),
  Total = 10,
  weight = rexp(100),
  day = sample(1:3, 100, TRUE),
  location = sample(1:5, 100, TRUE)
)

m <- brms::brm(
  formula = Alive|trials(Total) + weights(weight) ~ 1 + 
    day + (1|day*location),
  data = d,
  family = stats::binomial(link = 'logit'),
  cores = 4,
  save_ranef = TRUE,
  future = TRUE,
  seed = 111,
  iter = 50000,
  algorithm = "meanfield",
  control = list(adapt_delta = 0.999, max_treedepth = 15)
)

Here is my sessionInfo():

R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brms_2.9.3 Rcpp_1.0.1

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-6    gtools_3.8.1         StanHeaders_2.18.1   threejs_0.3.1       
 [5] shiny_1.3.2          assertthat_0.2.1     stats4_3.6.0         pillar_1.4.0        
 [9] backports_1.1.4      lattice_0.20-38      glue_1.3.1           digest_0.6.19       
[13] promises_1.0.1       colorspace_1.4-1     htmltools_0.3.6      httpuv_1.5.1        
[17] Matrix_1.2-17        plyr_1.8.4           dygraphs_1.1.1.6     pkgconfig_2.0.2     
[21] rstan_2.18.2         purrr_0.3.2          xtable_1.8-4         mvtnorm_1.0-10      
[25] scales_1.0.0         processx_3.3.1       later_0.8.0          tibble_2.1.1        
[29] bayesplot_1.6.0      ggplot2_3.1.1        DT_0.6               withr_2.1.2         
[33] shinyjs_1.0          lazyeval_0.2.2       cli_1.1.0            magrittr_1.5        
[37] crayon_1.3.4         mime_0.6             ps_1.3.0             nlme_3.1-139        
[41] xts_0.11-2           pkgbuild_1.0.3       colourpicker_1.0     rsconnect_0.8.13    
[45] tools_3.6.0          loo_2.1.0            prettyunits_1.0.2    matrixStats_0.54.0  
[49] stringr_1.4.0        munsell_0.5.0        callr_3.2.0          packrat_0.5.0       
[53] compiler_3.6.0       rlang_0.3.4          grid_3.6.0           ggridges_0.5.1      
[57] rstudioapi_0.10      htmlwidgets_1.3      crosstalk_1.0.0      igraph_1.2.4.1      
[61] miniUI_0.1.1.1       base64enc_0.1-3      gtable_0.3.0         codetools_0.2-16    
[65] inline_0.3.15        abind_1.4-5          markdown_0.9         reshape2_1.4.3      
[69] R6_2.4.0             gridExtra_2.3        rstantools_1.5.1     zoo_1.8-5           
[73] bridgesampling_0.6-0 dplyr_0.8.1          shinystan_2.5.0      shinythemes_1.1.2   
[77] stringi_1.4.3        parallel_3.6.0       tidyselect_0.2.5     coda_0.19-2  

Hi @paul.buerkner,

Thanks once again!

Regarding the second comment (#2), I used meanfield because of the time it takes to train a multilevel model with say 50000 observations seems to take a significant amount of time. However, I agree with you on the results produce by the different methods.

I’m currently looking at the Rstan installation for possible clues to the problem.

Thanks!

I see why you want to use meanfield, but my argument is that speed doesn’t matter if the results cannot be trusted.

1 Like

I like the way you put it!!

Following up two years later. @paul.buerkner, would you still not recommend ADVI in 2021 or have we gotten the better diagnostics since?

I am not an expert in advi so I may be quite wrong here but I would still not recommend it.

That’s still helpful. Thanks!