Convergence of variational inference

songpeng · November 21, 2020, 4:45am

I run a stan model under cmdstanr using variational inference. And I set the tol_rel_obj as 0.0002 as one way to detect the convergence, and max number of iteration is 50,000. Then I notice that some times I can get the log like the figure below

Based on the log, it has not touched the max number of iteration, so it should converge. But the log says "May be diverging … ".

My question is if this mode is convergent or not? What’s the meaning of “May be diverging”?

martinmodrak · December 5, 2020, 7:42pm

Sorry for letting your question fall through.
I fear the message says it as it is you would need a bit deeper inspection to check the convergence. I know there was quite recently some upgrade of the ADVI diagnostics, but I don’t use it regularly so I am not sure where a good doc can be found. Also unfortunately, the ADVI mode in Stan is quite fragile and does break for a lot of models, so you should always check the results very carefully.

Tagging @bbbales2 for potential further insights.

songpeng · December 5, 2020, 8:32pm

@martinmodrak
Thank you so much for your reply. When you say ADVI in Stan is quite fragile and does break for a lot of models, you mean ADVI itself may fail for lots of models and the implementation in Stan might have bugs or numerical issues?

bbbales2 · December 5, 2020, 8:58pm

Thanks for the ping @martinmodrak .

I had a look at the code and found this:

if (delta_elbo_ave < tol_rel_obj) {
  ss << "   MEAN ELBO CONVERGED";
  do_more_iterations = false;
}

if (delta_elbo_med < tol_rel_obj) {
  ss << "   MEDIAN ELBO CONVERGED";
  do_more_iterations = false;
}

if (iter_counter > 10 * eval_elbo_) {
  if (delta_elbo_med > 0.5 || delta_elbo_ave > 0.5) {
    ss << "   MAY BE DIVERGING... INSPECT ELBO";
  }
}

Source is here.

It looks like it’s possible for the median of the delta_elbo (which looks like a relative difference in subsequent elbo values: https://github.com/stan-dev/stan/blob/develop/src/stan/variational/advi.hpp#L378) to get small while the mean is still large.

I’m not sure what is happening, but it does seem suspicious. Like a distribution can have a mean and no variance, but those are tricky distributions, and if in the process of fitting a model the difference in subsequent elbo estimates start acting like this that seems suspicious to me.

Probably the thing to do in this case is look at diagnostic plots for the fits that are doing this. Make predictions and see if everything is doing what you expect, or maybe run the sampler and compare the results you get there.

@avehtari do you recognize this behavior?

bbbales2 · December 5, 2020, 9:06pm

Regarding this, this paper describes some problems specifically with how ADVI convergence is handled in Stan and how we can do better: [2009.00666] Robust, Accurate Stochastic Optimization for Variational Inference

Here’s another paper evaluating fitting models with ADVI: https://arxiv.org/abs/1802.02538

To the extent there are bugs of the numerical sort, we like to fix them, but these things definitely come up so they can’t be excluded. I’d start by making predictions and comparing against sampling though to figure out what is going on.

Topic		Replies	Views
VI: extract summary indicating non-convergence Interfaces	3	40	September 20, 2024
Stan Variational Inference Deprecation and Documentation Developers fitting-issues	2	151	April 25, 2025
Variational Inference: ELBO is not ascending Modeling rstan , fitting-issues	1	410	December 8, 2023
New Theoretical analysis for ADVI Algorithms variational-bayes , advi	0	459	May 28, 2023
Cmd stan 2.18 Developers	9	633	August 15, 2018

Convergence of variational inference

Related topics