Stan-relevant paper: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

ariddell · November 30, 2017, 12:25pm

Perhaps this is of interest. There’s a detailed comparison to ADVI. Claims to be ~10x faster.

Link: [1706.02375v2] Fast Black-box Variational Inference through Stochastic Trust-Region Optimization Here are the details:

Fast Black-box Variational Inference through Stochastic Trust-Region Optimization
Jeffrey Regier, Michael I. Jordan, Jon McAuliffe
(Submitted on 7 Jun 2017 (v1), last revised 5 Nov 2017 (this version, v2))

We introduce TrustVI, a fast second-order algorithm for black-box variational inference based on trust-region optimization and the reparameterization trick. At each iteration, TrustVI proposes and assesses a step based on minibatches of draws from the variational distribution. The algorithm provably converges to a stationary point. We implemented TrustVI in the Stan framework and compared it to two alternatives: Automatic Differentiation Variational Inference (ADVI) and Hessian-free Stochastic Gradient Variational Inference (HFSGVI). The former is based on stochastic first-order optimization. The latter uses second-order information, but lacks convergence guarantees. TrustVI typically converged at least one order of magnitude faster than ADVI, demonstrating the value of stochastic second-order information. TrustVI often found substantially better variational distributions than HFSGVI, demonstrating that our convergence theory can matter in practice.

wds15 · November 30, 2017, 12:42pm

more interesting than faster than ADVI would be if they can deal with hierarchical models well. From my tests ADVI is amazingly fast, but it is not sure what you get for the models I played with.

Andre_Pfeuffer · November 30, 2017, 1:47pm

Quality of optimal points. For 126 of the 183 models (69%), on sets of five runs, the median optimal
values found by ADVI and TrustVI did not differ substantively. For 51 models (28%), TrustVI found
better optimal values than ADVI. For 6 models (3%), ADVI found better optimal values than TrustVI.

Now we know it or not.

avehtari · November 30, 2017, 8:04pm

I thought optimal means best.

Bob_Carpenter · December 21, 2017, 11:45pm

Hard to tell if that was sarcastic or not, so I’ll provide a straight reply. If ADVI and TrustVI have different objectives, they’ll reach different optima, one of which may be a better estimate than the other.

Dan Simpson and Michael Betancourt had some comments on this when it first came out, but they were in a private email thread the summary of which is that this isn’t too surprising. Dan pointed out trust region methods will help condition Newton steps, so it’s not too surprising from an ill-conditioned optimization perspective. The real problem is what we do with VI estimates given that they’re so far off in most cases where the (unconstrained) posterior isn’t approximately Gaussian. Michael also points out that all the proofs rely on Gaussian i.i.d. assumptions, which won’t necessarily be satisfied in the kinds of smaller data problems we deal with regularly—that’s the theory behind what we saw in practice. Michael also pointed out that his SoftAbs RHMC (see the arXiv paper) is itself like a trust-region method.

andre.pfeuffer · December 22, 2017, 4:23am

If we have different objectives, we have to define the means of comparison between them. I cannot find this in the paper.

Topic		Replies	Views
Correlated 2D Gaussian breaks ADVI Modeling fitting-issues	23	3473	July 12, 2018
Why does ADVI use stochastic gradient ascent not LBFGS Algorithms	10	1707	July 22, 2018
Speed up adaptation with Variational Approximation? Modeling	22	2105	May 16, 2017
New Theoretical analysis for ADVI Algorithms variational-bayes , advi	0	470	May 28, 2023
ADVI / Stochastic quasi-Newton methods Algorithms variational-bayes	5	1353	September 22, 2017

Stan-relevant paper: Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Related topics