Ebfmi in cmdstanr

Cmdstanr has a check_divergences function and a check_sampler_transitions_treedepth function, but to my knowledge no equivalent check_ebfmi function. Is there any particular reason for this, or would an addition be welcome?

The primary use case is if the R6 object is rebuilt from the csv files, then cmdstan_diagnose() doesn’t work, and as far as I can see there’s no way short of writing a custom function to check the e-bfmi diagnostic.

@jonah @rok_cesnovar

1 Like

Yeah I’d be open to adding this if you want to take a crack at implementing it.

I think the only reason it’s missing is that we were hoping we would have convergence diagnostics in the posterior package by now (Add convergence warnings · Issue #77 · stan-dev/posterior · GitHub) and that cmdstanr could just use them. But since there’s still a bunch of work to do to get this stuff into posterior I guess maybe we should go ahead and just add it to cmdstanr in the meantime. We could then deprecate all the cmdstanr implementations when posterior has them and then just use the posterior implementations internally in cmdstanr.

I’d like to eventually have a fit$diagnose() method in cmdstanr that uses posterior in a similar way to how fit$summary() calls posterior::summarize_draws(), but until then I’m fine with adding more individual diagnostic check functions to cmdstanr.

3 Likes

Sounds good. The main reason I thought to include in cmdstanr rather than posterior is because posterior has broad applicability to models not fit by HMC and I wasn’t sure you wanted it to carry around a bunch of HMC-specific baggage.

Yeah that was our initial thinking, but since then we decided that it makes sense to add algorithm-specific stuff to posterior but not Stan-specific stuff. That is, posterior shouldn’t need to know about how any of the Stan R packages work or how Stan does anything in particular, but it would be good if it could handle HMC/NUTS stuff in a way that anyone using those algorithms (regardless of whether they’re using Stan) can use posterior.

Also, one big reason we want to implement the HMC/NUTS stuff in posterior is that then we can use that stuff in all of our R packages instead of having implementations in each of those packages. This would help unify all the warning messages people see when running Stan from R (right now each package behaves a bit differently in that regard) and allow us to stop maintaining a bunch of different implementations of the diagnostic checks.

2 Likes

Ha, I was looking at this just this morning. What’s the formula for ebmfi? I know it includes the variance of the energy__ column, but I couldn’t figure out the other terms last I looked at the paper.

2 Likes
ebfmi <- apply(energy, 2, function(x) {
      (sum(diff(x)^2)/length(x))/var(x)
    })

page 44 here (note that n is zero-indexed here) https://arxiv.org/pdf/1701.02434.pdf

I’m just double checking that cmdstan itself (like rstan) calculates ebfmi on a per-chain basis. I’m terrible at reading c++ though.

2 Likes

Is that /length(x) supposed to be there? It’s not in the paper’s eqn. (thanks for the paper btw, I had been looking at a different one and this one is way clearer)

Oh, nevermind, it’s in your eqn to make the ratio work while using var(x) in the denominator.

Pull request at:

This is just for a check_bfmi function. A logical next step is to add a check_sampler_diagnostics function that can be called on the R6 object to check treedepth, divergences, E-BFMI.

2 Likes

What’s standard for combining max_treedepth and ebmfi from multiple chains? Max and min respectively?