Bayes_factor() Warning messages: 1: logml could not be estimated within maxiter,

Hi all,

I constructed two models and would like to compare the model fit using Bayes_factor().

f0_on_11_10 <- brm(fzero_onset ~ position+voicing+target_vowel+poa+
               data= fzero,  sample_prior = TRUE,
    save_pars = save_pars(all = TRUE),
             core=8, iter=10000,warmup=1000, chains = 4,
             init = 0, 

f0_on_1_bo_10 <- brm(fzero_onset ~ position*voicing*target_vowel+poa+
               data= fzero_bo,  sample_prior = TRUE,
    save_all_pars = TRUE,
             core=8, iter=10000, warmup=1000, chains = 4,
             init = 0, 

When I ran:

bayes_factor(f0_on_11_10,f0_on_1_bo_10 )

I keep getting:

Estimated Bayes factor in favor of f0_on_11_10 over f0_on_1_bo_10: 0.00000
Warning messages:
1: logml could not be estimated within maxiter, rerunning with adjusted starting value. 
Estimate might be more variable than usual. 
2: logml could not be estimated within maxiter, rerunning with adjusted starting value. 
Estimate might be more variable than usual. 

when revering the order of the model as in:

brms::bayes_factor(f0_on_1_bo_10,f0_on_11_10 )

I keep getting the same thing:

Estimated Bayes factor in favor of f0_on_1_bo_10 over f0_on_11_10:    Inf
Warning messages:
1: logml could not be estimated within maxiter, rerunning with adjusted starting value. 
Estimate might be more variable than usual. 
2: logml could not be estimated within maxiter, rerunning with adjusted starting value. 
Estimate might be more variable than usual. 

Here is my session info, just in case it’d help:

R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brms_2.17.4    Rcpp_1.0.8.3   lmerTest_3.1-3 lme4_1.1-29    Matrix_1.3-4  

loaded via a namespace (and not attached):
  [1] TH.data_1.1-1        minqa_1.2.4          colorspace_2.0-3     ellipsis_0.3.2      
  [5] modeltools_0.2-23    ggridges_0.5.3       estimability_1.3     markdown_1.1        
  [9] base64enc_0.1-3      rstudioapi_0.13      farver_2.1.0         rstan_2.26.13       
 [13] DT_0.23              fansi_1.0.3          mvtnorm_1.1-3        lubridate_1.8.0     
 [17] diffobj_0.3.5        coin_1.4-2           bridgesampling_1.1-2 codetools_0.2-18    
 [21] splines_4.1.2        knitr_1.39           libcoin_1.0-9        shinythemes_1.2.0   
 [25] bayesplot_1.9.0      jsonlite_1.8.0       nloptr_2.0.2         shiny_1.7.1         
 [29] compiler_4.1.2       emmeans_1.7.4-1      backports_1.4.1      assertthat_0.2.1    
 [33] fastmap_1.1.0        strucchange_1.5-2    cli_3.3.0            later_1.3.0         
 [37] htmltools_0.5.2      prettyunits_1.1.1    tools_4.1.2          igraph_1.3.2        
 [41] coda_0.19-4          gtable_0.3.0         glue_1.6.2           reshape2_1.4.4      
 [45] dplyr_1.0.9          posterior_1.2.2      V8_4.2.0             vctrs_0.4.1         
 [49] nlme_3.1-153         crosstalk_1.2.0      tensorA_0.36.2       xfun_0.31           
 [53] stringr_1.4.0        ps_1.7.1             mime_0.12            miniUI_0.1.1.1      
 [57] lifecycle_1.0.1      gtools_3.9.2.2       MASS_7.3-54          zoo_1.8-10          
 [61] scales_1.2.0         rstanarm_2.21.3      colourpicker_1.1.1   promises_1.2.0.1    
 [65] Brobdingnag_1.2-7    parallel_4.1.2       sandwich_3.0-1       inline_0.3.19       
 [69] shinystan_2.6.0      yaml_2.3.5           curl_4.3.2           gridExtra_2.3       
 [73] ggplot2_3.3.6        loo_2.5.1            StanHeaders_2.26.13  stringi_1.7.6       
 [77] dygraphs_1.1.1.6     checkmate_2.1.0      boot_1.3-28          pkgbuild_1.3.1      
 [81] rlang_1.0.2          pkgconfig_2.0.3      matrixStats_0.62.0   distributional_0.3.0
 [85] evaluate_0.15        lattice_0.20-45      purrr_0.3.4          rstantools_2.2.0    
 [89] htmlwidgets_1.5.4    processx_3.6.1       tidyselect_1.1.2     plyr_1.8.7          
 [93] magrittr_2.0.3       R6_2.5.1             generics_0.1.2       multcomp_1.4-19     
 [97] DBI_1.1.2            pillar_1.7.0         xts_0.12.1           survival_3.2-13     
[101] abind_1.4-5          tibble_3.1.7         crayon_1.5.1         utf8_1.2.2          
[105] party_1.3-10         rmarkdown_2.14       grid_4.1.2           callr_3.7.0         
[109] threejs_0.3.3        digest_0.6.29        xtable_1.8-4         httpuv_1.6.5        
[113] numDeriv_2016.8-1.1  RcppParallel_5.1.5   stats4_4.1.2         munsell_0.5.0       
[117] shinyjs_2.1.0   

Could anybody please advise what is going on? Am I doing something wrong?

Thanks in advance!

That warning comes from the bridgesampling package, which brms uses to estimate the marginal likelihoods of the models (needed for Bayes factors).

A difficulty with the Bridge Sampling approach to estimating marginal likelihoods is that it (generally) requires a large number of posterior samples for reliable results. That warning message is saying that the estimated marginal likelihood (and resulting Bayes factor) may not be reliable, and that this may be due to not having enough posterior samples.

I’d recommend increasing the number of iterations for the brms models (try doubling at least) and then seeing whether the warnings remain

1 Like

Thanks @andrjohns! Will give it a try and see how it goes.

I think it is also due to that the two marginal likelihoods differ a lot in orders of magnitude. It is very hard for bridge sampling to estimate a very small or very big marginal likelihood (as opposed to estimating the log marginal). In this case it is probably enough to know that one of your model is much better than the other?


Bridgesampling estimates the marginal likelihood separately for each model, so the difference in magnitude doesn’t matter, but looking at the models it’s likely that even running more iterations is not going to help to remove the variability the diagnostic message is warning about.


As you expected @avehtari, running more iterations (e.g., 25000) doesn’t help either. Does it mean something wrong with the model and I should pay especial attention to it?

@yuling I could indirectly compare the model using bayes_R2() as a way around this issue, but I was curious t see why I’m getting this warning message, and whether is some solution for it?

Thanks all!

Because the mentioned function effectively computes the marginal likelihood for two models, maybe you could print out the log marginal likelihood for each model and see which one causes problems?

It seems to me that the message is clear that one model is much better than the other: would it make too much difference if a user knows the Bayes factor is 1e5 or 1e7?


This is not an indication of wrong model. Bridgesampling can fail if you have at least tens of parameters and the posterior is not close to Gaussian. As you have random effect terms, it’s likely that you have many parameters, but without knowing the number of levels in and word, it’s not possible to see how many parameters you have in your model.

I also now see that you are using the defualt priors, but those are uniform for the coefficients and marginal likelihood is not defined for improper priors (ie prior that doesn’t have finite integral). You could try again with proper priors.

As the warning message is repeated twice, I think the logml estimation fails for both models.

1 Like

thanks for this @avehtari ! has 20 levels, and word has about 54 levels.

Yes, thse are high enough that if the posterior is far from normal then bridgesampling can fail. As you have define adapt_delta=0.999 I guess you were getting divergences which indicates that your posterior is really far from normal distribution.

What is the goal of your analysis? Maybe there is another way to get a useful answer (Bayes factors are not necessarily the right choice even if the computation would work)

1 Like

Weighing in here because I wrestled with a similar problem for a while. As @avehtari pointed out, the bridge sampling approximation can fail spectacularly in high dimensions and anything involving bayes factors directly or indirectly requires at least moderately informative priors. You can find a more detailed discussion of these issues and some proposed solutions in Schad et al. (2022) and Wang, Jones, and Meng (2020).

The areas where I’ve encountered problems almost always involve hierarchical models with random intercepts and slopes (e.g., several hundred parameters). How to best tackle it depends on what the concern is and the question to be answered.

If the main concern is that the algorithm may be unstable, then one option for addressing that may be to estimate a distribution of the log marginal likelihood and average over the uncertainty in the bridge sampling approximation by running the algorithm for a reasonably large number of repetitions (>=100 or so). There’s code for how to do this in the context of brms here.

The second consideration is whether a bayes factor is necessary for answering your question or if some other approach that doesn’t necessarily require the kinds of extensive robustness checks that Bayes Factors tend to might be sufficient.

1 Like

If by the repetitions you mean the repetitions argument in the bridgesampling package, then that it will provide overoptimistic estimate of the uncertainty. For better estimate you need to rerun also the posterior sampling. We’re writing a case study with additional diagnostics.


Interesting and good to know. Please do let me know once you have a draft of that you’d be willing to share if it’s not too much trouble as it may be helpful for some parts of my dissertation work.

1 Like

Thanks for your follow-up @avehtari.

I just need to compare two models, one includes interaction between parameters and the other doesn’t include interactions. Although I already decided to go with the model with interaction, I still want to see how each performs compared to the other.

Not sure if rescaling the response variable would help in this respect…

Thanks for your input @ajnafa !
I’ll go through it.
For the second part, my aim is just compare two models as explained in the above reply. I’m open to any other approaches that fulfill this aim.

Thanks again!

Have you looked at the posterior of the interaction term? That could tell you whether it’s not small.

What is the purpose of the models in the end? That is, why it is interesting to know whether there is an interaction or not?