The rstantools release introduces a new and improved process for developing R packages with pre-compiled Stan models (thanks largely to contributions from @mlysy). Release notes are available on the rstantools website:
Also, there was no rstanarm 2.19.1, as CRAN rejected it due to having relative paths in the vignettes, which were removed in 2.19.2. There were also several PRs from @avehtari that were merged but omitted from the NEWS that pertain to using importance sampling to reweight the draws from a multivariate normal distribution evaluated at the posterior mode when algorithm is not MCMC sampling.
Using PSIS and importance resampling means also that we can use PSIS-LOO also when algorithm=āoptimizingā (I recommend to increase the number of draws from the default value, and for bigger n you may need the latest loo from github with one over-strict check loosened)
In addition of optimizing these work for meanfield and `fullrankā ADVI, but so far we have not seen any example where these would be better than optimizing or MCMC.
There is also is 4x speedup for GLMs and GAMs (with all inference algorithms) with normal (when n<=p, OLS trick was already used for n>p), bernoulli, poisson and neg_binomial_2 families, using compound glm functions previously implemented in Stan math by Matthijs VƔkƔr
In addition of optimizing these work for meanfield and `fullrankā ADVI, but so far we have not seen any example where these would be better than optimizing or MCMC.
What exactly do you mean here? Are you saying that there is no advantage to using ADVI vs. optimization or that there isnāt any advantage to using PSIS with ADVI sampling?
I have not yet seen an example where there would be advantage of using ADVI, if we want the same accuracy as what we can get with MCMC.
Small data and small model: MCMC is fast and no need for Laplace (optimizing) or ADVI.
Big data and model is such that the posterior is close to Gaussian: MCMC is slow, Laplace (optimizing) is much faster than MCMC and ADVI. ADVI could produce good approximation, but is much slower than Laplace (optimizing).
Big data and model is such that the posterior is not close to Gaussian: MCMC is slow, Laplace (optimizing) and ADVI produces bad approximations. ADVI may produce better approximation than Laplace, but is not able to produce the accuracy of MCMC.
Iām happy if someone can show an example, where Laplace (optimizing) is not able to achieve accuracy of MCMC but ADVI achieves the accuracy of MCMC with less computation time than MCMC.
I know that some people have used ADVI to obtain posterior approximation which is far from the true posterior, but which is faster to compute and sufficient for making useful predictions in a specific application. This is different task from what I consider above. In these cases, e.g., cross-validation can be used to check whether predictions are useful, but the only way to check if we could get better predictions with MCMC is to run MCMC (and if it wasnāt clear yet, in the above I consider cases, where importance sampling idea can be used to check if the approximation is close enough to the true posterior, so that we donāt need to run MCMC).
I just wrote a wrapper for ADVI in CmdStanPy precisely because some people are using ADVI, but I agree completely with Aki (and Bob) that this is not a good thing.
I donāt know about official roadmap, so this is just my personal opinion based on some discussion with others. ADVI is labeled as experimental. There is some effort to remove it, so itās likely it will stay there for a long time. There has been a lot of questions about the performance of ADVI and I made the effort to add diagnostics (coming to CmdStan, too), so that at least itās easier to see that itās not competing with MCMC. Having more and better diagnostics for all things in Stan, seems to be generally accepted to be good idea.
There are some things in ADVI which can be improved to get more speed, stability and accuracy, which might make ADVI to be competitive for a small set of models. There might be more elaborate VI methods which would go beyond Gaussian approximations to extend that set of models. Itās unlikely that these VI methods could be much faster than MCMC and still obtain the same accuracy in general case, and thus itās unlikely that VI would have big role in Stan in the near future. Some people are using ADVI in cases where they donāt care if itās far from MCMC accuracy, but that path is unlikely to be in Stan roadmap. I think itās good to improve how easy it is to use Stan to provide log density, gradients etc. and let interested people to experiment with external VI algorithms, and maybe someone can come up with something useful.
This is also my experience: extensive tuning was spent to make ADVI āworkā, if at all, for some of my models. Even then the accuracy is nowhere near sampling. What feels off to me is that among three methods Stan offers ADVI is far less reliable than optimizer and sampler, and when I want to speed up a model with limited number of hyperparameters, INLA comes to mind instead of anything in Stan.
Letās hope nested LA would speed up models that fits the bill.
Historically it may help to know that the addition of ADVI was opposed by many developers at the time ā not just because of the lack of validation of the algorithm but also because the code itself wasnāt the best quality. Unfortunately there were external factors and conflicts o interested that pressured its inclusion. We have learned from this, however, and are slowly but surely setting up procedures to avoid it in the future.