What are the differences between NUTS and ADVI?

jroberayalas · April 22, 2020, 6:18am

Hi! I’m new in the Bayesian / Stan world, and I’ve come across a lot the NUTS and ADVI methods for doing inference. My understanding is that you should avoid using the optimizing function if you want uncertainty intervals, and instead use one of these two. Can anyone provide any overall summary of the difference between them? Is it fair to say that ADVI is faster (but less accurate) than NUTS? Any help (and useful references) is highly appreciated!

emiruz · April 22, 2020, 9:18am

NUTS is an MCMC method and is supposed to be precise in the limit. Meanwhile, ADVI is supposed to, given a probability model, automatically derive a variational inference algorithm for you. Variational inference is an approximation of the real posterior usually by means of assuming things like independence in otherwise joint distributions and/or replacing your distributions with simpler ones.

Others have more extensive experience and are more qualified to comment but I’ve personally found ADVI to be unreliable even for relatively straightforward models; and I’ve never made it work for anything a little bit fancy. This was a problem for me in situations where I couldn’t verify whether the results were correct by other means; just because my prior is that the procedure fails so often.

IMHO, use NUTS always, and you won’t be disappointed most of the time. There’s lots of tricks and ways to make NUTS work for moderately sized data, and when that stops working you may be better of using a different method (e.g. ILTA) or coding your own problem specific sampler, variational algo, etc.

Stan is pretty awesome for a broad class of problems. And even for quite big data it’s at least a great way to work out what you’re trying to do in principal.

stemangiola · April 22, 2020, 9:29am

I have a preprint that shows an example of Variational Bayes providing consistent results with HMC

So, sometime is worth it, especially if you build 3rd party software and execution time is an expensive currency

emiruz · April 22, 2020, 10:37am

Kucukelbir et al 2015 (the ADVI paper) illustrates a swath of working models doing all sorts of things which utilise ADVI. That’s great and obviously these models exist. Trouble is that no-one is publishing models that don’t work :-) and I think these may be in abundance. If you can independently verify that your ADVI solution is working on a variety of data by some other method and ADVI as an implementation provides you with other benefits relevant to your problem; then sounds like you have a winner. But if you want a method you can more or less trust out of the box; it’s NUTS. IMHO.

betanalpha · April 23, 2020, 12:26am

See https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html for a discussion of probabilistic computation and the conceptual differences between Variational Bayes (of which ADVI is an example) and Markov chain Monte Carlo (of which dynamics HMC is an example – keep in mind that Stan no longer uses the NUTS algorithm but rather an improved dynamic HMC algorithm).

Topic		Replies	Views
ADVI and NUTS Modeling	2	632	April 2, 2019
Comparing implementations of mixed logit Bayesian inference General	3	1046	July 14, 2020
Variational Bayes results seems sensible, but vary - What to change? Modeling variational-bayes	6	963	November 6, 2020
How to implement ADVI Algorithms	2	624	March 12, 2019
Variational Bayes versus MAP for prediction Algorithms	5	3398	December 7, 2019

What are the differences between NUTS and ADVI?

Related topics