Ebfmi in cmdstanr

jsocolar · May 14, 2021, 7:58pm

Cmdstanr has a check_divergences function and a check_sampler_transitions_treedepth function, but to my knowledge no equivalent check_ebfmi function. Is there any particular reason for this, or would an addition be welcome?

The primary use case is if the R6 object is rebuilt from the csv files, then cmdstan_diagnose() doesn’t work, and as far as I can see there’s no way short of writing a custom function to check the e-bfmi diagnostic.

@jonah @rok_cesnovar

jonah · May 14, 2021, 8:12pm

Yeah I’d be open to adding this if you want to take a crack at implementing it.

I think the only reason it’s missing is that we were hoping we would have convergence diagnostics in the posterior package by now (Add convergence warnings · Issue #77 · stan-dev/posterior · GitHub) and that cmdstanr could just use them. But since there’s still a bunch of work to do to get this stuff into posterior I guess maybe we should go ahead and just add it to cmdstanr in the meantime. We could then deprecate all the cmdstanr implementations when posterior has them and then just use the posterior implementations internally in cmdstanr.

I’d like to eventually have a fit$diagnose() method in cmdstanr that uses posterior in a similar way to how fit$summary() calls posterior::summarize_draws(), but until then I’m fine with adding more individual diagnostic check functions to cmdstanr.

jsocolar · May 14, 2021, 8:14pm

Sounds good. The main reason I thought to include in cmdstanr rather than posterior is because posterior has broad applicability to models not fit by HMC and I wasn’t sure you wanted it to carry around a bunch of HMC-specific baggage.

jonah · May 14, 2021, 8:18pm

Yeah that was our initial thinking, but since then we decided that it makes sense to add algorithm-specific stuff to posterior but not Stan-specific stuff. That is, posterior shouldn’t need to know about how any of the Stan R packages work or how Stan does anything in particular, but it would be good if it could handle HMC/NUTS stuff in a way that anyone using those algorithms (regardless of whether they’re using Stan) can use posterior.

jonah · May 14, 2021, 8:28pm

Also, one big reason we want to implement the HMC/NUTS stuff in posterior is that then we can use that stuff in all of our R packages instead of having implementations in each of those packages. This would help unify all the warning messages people see when running Stan from R (right now each package behaves a bit differently in that regard) and allow us to stop maintaining a bunch of different implementations of the diagnostic checks.

mike-lawrence · May 14, 2021, 8:59pm

Ha, I was looking at this just this morning. What’s the formula for ebmfi? I know it includes the variance of the energy__ column, but I couldn’t figure out the other terms last I looked at the paper.

jsocolar · May 14, 2021, 9:01pm

ebfmi <- apply(energy, 2, function(x) {
      (sum(diff(x)^2)/length(x))/var(x)
    })

page 44 here (note that n is zero-indexed here) https://arxiv.org/pdf/1701.02434.pdf

I’m just double checking that cmdstan itself (like rstan) calculates ebfmi on a per-chain basis. I’m terrible at reading c++ though.

mike-lawrence · May 15, 2021, 1:05am

Is that /length(x) supposed to be there? It’s not in the paper’s eqn. (thanks for the paper btw, I had been looking at a different one and this one is way clearer)

mike-lawrence · May 15, 2021, 1:06am

Oh, nevermind, it’s in your eqn to make the ratio work while using var(x) in the denominator.

jsocolar · May 15, 2021, 3:02am

Pull request at:

github.com/stan-dev/cmdstanr

Add check_bfmi function

stan-dev:master ← jsocolar:master

opened 02:56AM - 15 May 21 UTC

jsocolar

+14 -0

#### Submission Checklist - [ ] Run unit tests - [x] Declare copyright holde…r and agree to license (see below) #### Summary Added function `check_bfmi`, which takes as input the output of `sampler_diagnostics`, computes the estimated Bayesian fraction of missing information (E-BFMI) for each chain, and prints a message if any chain has E-BFMI less than 0.3. Note that this uses the threshold of 0.3, consistent with cmdstan's [diagnose.cpp](https://github.com/stan-dev/cmdstan/blob/4f569096ab15676c8050b9cff83de6b17541b609/src/cmdstan/diagnose.cpp) (see line 154), but different from the 0.2 threshold used in [rstan's `check_energy` function](https://github.com/stan-dev/rstan/blob/da2fc9c079534a82d3d26adda51ad17bf22f5e2b/rstan/rstan/R/check_hmc_diagnostics.R) (see line 259). #### Copyright and Licensing Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Jacob B. Socolar By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)

This is just for a check_bfmi function. A logical next step is to add a check_sampler_diagnostics function that can be called on the R6 object to check treedepth, divergences, E-BFMI.

mike-lawrence · May 15, 2021, 1:36pm

What’s standard for combining max_treedepth and ebmfi from multiple chains? Max and min respectively?

Topic		Replies	Views
Is check_hmc_diagnostics the quickest and simplest way for a newbie to verify convergence? Modeling rstan , brms	1	400	June 3, 2023
Quantitative diagnostics for assessing the convergence of MCMC samples General	5	694	June 15, 2021
Error: Can't find the following sampler diagnostic(s) in the output: treedepth__, divergent__ CmdStan	4	744	August 7, 2024
CmdStanR v0.8.0 Released Announcements	0	344	May 18, 2024
Reproducing diagnostics after fit? Interfaces rstan	4	535	December 24, 2020

Ebfmi in cmdstanr

Related topics