What are the differences between NUTS and ADVI?

Hi! I’m new in the Bayesian / Stan world, and I’ve come across a lot the NUTS and ADVI methods for doing inference. My understanding is that you should avoid using the optimizing function if you want uncertainty intervals, and instead use one of these two. Can anyone provide any overall summary of the difference between them? Is it fair to say that ADVI is faster (but less accurate) than NUTS? Any help (and useful references) is highly appreciated!

NUTS is an MCMC method and is supposed to be precise in the limit. Meanwhile, ADVI is supposed to, given a probability model, automatically derive a variational inference algorithm for you. Variational inference is an approximation of the real posterior usually by means of assuming things like independence in otherwise joint distributions and/or replacing your distributions with simpler ones.

Others have more extensive experience and are more qualified to comment but I’ve personally found ADVI to be unreliable even for relatively straightforward models; and I’ve never made it work for anything a little bit fancy. This was a problem for me in situations where I couldn’t verify whether the results were correct by other means; just because my prior is that the procedure fails so often.

IMHO, use NUTS always, and you won’t be disappointed most of the time. There’s lots of tricks and ways to make NUTS work for moderately sized data, and when that stops working you may be better of using a different method (e.g. ILTA) or coding your own problem specific sampler, variational algo, etc.

Stan is pretty awesome for a broad class of problems. And even for quite big data it’s at least a great way to work out what you’re trying to do in principal.


I have a preprint that shows an example of Variational Bayes providing consistent results with HMC

So, sometime is worth it, especially if you build 3rd party software and execution time is an expensive currency


Kucukelbir et al 2015 (the ADVI paper) illustrates a swath of working models doing all sorts of things which utilise ADVI. That’s great and obviously these models exist. Trouble is that no-one is publishing models that don’t work :-) and I think these may be in abundance. If you can independently verify that your ADVI solution is working on a variety of data by some other method and ADVI as an implementation provides you with other benefits relevant to your problem; then sounds like you have a winner. But if you want a method you can more or less trust out of the box; it’s NUTS. IMHO.


See https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html for a discussion of probabilistic computation and the conceptual differences between Variational Bayes (of which ADVI is an example) and Markov chain Monte Carlo (of which dynamics HMC is an example – keep in mind that Stan no longer uses the NUTS algorithm but rather an improved dynamic HMC algorithm).