Convergence of variational inference

I run a stan model under cmdstanr using variational inference. And I set the tol_rel_obj as 0.0002 as one way to detect the convergence, and max number of iteration is 50,000. Then I notice that some times I can get the log like the figure below

Based on the log, it has not touched the max number of iteration, so it should converge. But the log says "May be diverging … ".

My question is if this mode is convergent or not? What’s the meaning of “May be diverging”?

Sorry for letting your question fall through.
I fear the message says it as it is you would need a bit deeper inspection to check the convergence. I know there was quite recently some upgrade of the ADVI diagnostics, but I don’t use it regularly so I am not sure where a good doc can be found. Also unfortunately, the ADVI mode in Stan is quite fragile and does break for a lot of models, so you should always check the results very carefully.

Tagging @bbbales2 for potential further insights.

1 Like

Thank you so much for your reply. When you say ADVI in Stan is quite fragile and does break for a lot of models, you mean ADVI itself may fail for lots of models and the implementation in Stan might have bugs or numerical issues?

Thanks for the ping @martinmodrak .

I had a look at the code and found this:

if (delta_elbo_ave < tol_rel_obj) {
  do_more_iterations = false;

if (delta_elbo_med < tol_rel_obj) {
  do_more_iterations = false;

if (iter_counter > 10 * eval_elbo_) {
  if (delta_elbo_med > 0.5 || delta_elbo_ave > 0.5) {

Source is here.

It looks like it’s possible for the median of the delta_elbo (which looks like a relative difference in subsequent elbo values: to get small while the mean is still large.

I’m not sure what is happening, but it does seem suspicious. Like a distribution can have a mean and no variance, but those are tricky distributions, and if in the process of fitting a model the difference in subsequent elbo estimates start acting like this that seems suspicious to me.

Probably the thing to do in this case is look at diagnostic plots for the fits that are doing this. Make predictions and see if everything is doing what you expect, or maybe run the sampler and compare the results you get there.

@avehtari do you recognize this behavior?


Regarding this, this paper describes some problems specifically with how ADVI convergence is handled in Stan and how we can do better:

Here’s another paper evaluating fitting models with ADVI:

To the extent there are bugs of the numerical sort, we like to fix them, but these things definitely come up so they can’t be excluded. I’d start by making predictions and comparing against sampling though to figure out what is going on.