New Theoretical analysis for ADVI

Red-Portal · May 28, 2023, 6:07am

Hi all,

We’ve recently uploaded to ArXiv a theoretical analysis for ADVI (although the title says black-box VI), which I believe is the first formal full convergence proof for any black-box VI-type algorithm that uses SGD:

[2305.15349] Black-Box Variational Inference Converges (arxiv.org).

Interestingly, the analysis reveals some unexpected properties of the covariance parameterization that we use in practice. In particular, if we use non-linear transformations for the diagonal elements (as done for the mean-field parameterization in Stan), such as

L_{ii} = \exp(\ell_{i}),

where \mathbf{L} = \mathrm{diag}\left(L_{11}, \ldots, L_{dd}\right) is the Cholesky factor for the variational approximation, we provably lose speed. One could have achieved a \mathcal{O}\left(1/T\right) converge rates for nice posteriors but only gets \mathcal{O}\left(1/\sqrt{T}\right) instead. And if one does the same for full-rank parameterizations, as PyMC3, but not Stan, the ELBO might not even be convex even if the posterior is log-concave!

In our experiments, we indeed observe that the mean-field parameterization without any exp or softplus transformation, for enforcing the scale to be positive, converges the fastest. To me, this is one of those rare occasions where optimization theory precisely tells you what happens in practice, which is not that common, unfortunately.

Please let me know if you have any comments or questions.

Topic		Replies	Views
Stan-relevant paper: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization Algorithms	5	1441	December 22, 2017
Correlated 2D Gaussian breaks ADVI Modeling fitting-issues	23	3368	July 12, 2018
ADVI: Posteriors Algorithms	2	594	November 27, 2019
ADVI / Rats example / Adagrad Algorithms variational-bayes	1	1097	September 17, 2017
CmdStanPy Variational - How to get density parameters? Modeling techniques , fitting-issues , variational-bayes , covariance , cmdstanpy	10	72	November 19, 2024

New Theoretical analysis for ADVI

Related topics