I have not yet seen an example where there would be advantage of using ADVI, if we want the same accuracy as what we can get with MCMC.
- Small data and small model: MCMC is fast and no need for Laplace (optimizing) or ADVI.
- Big data and model is such that the posterior is close to Gaussian: MCMC is slow, Laplace (optimizing) is much faster than MCMC and ADVI. ADVI could produce good approximation, but is much slower than Laplace (optimizing).
- Big data and model is such that the posterior is not close to Gaussian: MCMC is slow, Laplace (optimizing) and ADVI produces bad approximations. ADVI may produce better approximation than Laplace, but is not able to produce the accuracy of MCMC.
I’m happy if someone can show an example, where Laplace (optimizing) is not able to achieve accuracy of MCMC but ADVI achieves the accuracy of MCMC with less computation time than MCMC.
I know that some people have used ADVI to obtain posterior approximation which is far from the true posterior, but which is faster to compute and sufficient for making useful predictions in a specific application. This is different task from what I consider above. In these cases, e.g., cross-validation can be used to check whether predictions are useful, but the only way to check if we could get better predictions with MCMC is to run MCMC (and if it wasn’t clear yet, in the above I consider cases, where importance sampling idea can be used to check if the approximation is close enough to the true posterior, so that we don’t need to run MCMC).