New releases of rstanarm and rstantools R packages

New versions of our R packages rstanarm (v2.19.2) and rstantools (v2.0.0) have been released and can be installed from CRAN via install.packages.

The rstanarm release includes a few bug fixes and a few speedups. The release notes are available from the rstanarm website:

The rstantools release introduces a new and improved process for developing R packages with pre-compiled Stan models (thanks largely to contributions from @mlysy). Release notes are available on the rstantools website:

5 Likes

Also, there was no rstanarm 2.19.1, as CRAN rejected it due to having relative paths in the vignettes, which were removed in 2.19.2. There were also several PRs from @avehtari that were merged but omitted from the NEWS that pertain to using importance sampling to reweight the draws from a multivariate normal distribution evaluated at the posterior mode when algorithm is not MCMC sampling.

1 Like
  • Pareto diagnostic and Pareto smoothed importance sampling https://arxiv.org/abs/1507.02646 is used to diagnose and improve inference when using stan_glm with algorithm=ā€˜optimizingā€™. Demonstration of timings with n=100,000 and p=100, and Gaussian and logistic regression at https://avehtari.github.io/RAOS-Examples/BigData/bigdata.html
  • Using PSIS and importance resampling means also that we can use PSIS-LOO also when algorithm=ā€˜optimizingā€™ (I recommend to increase the number of draws from the default value, and for bigger n you may need the latest loo from github with one over-strict check loosened)
  • In addition of optimizing these work for meanfield and `fullrankā€™ ADVI, but so far we have not seen any example where these would be better than optimizing or MCMC.
  • There is also is 4x speedup for GLMs and GAMs (with all inference algorithms) with normal (when n<=p, OLS trick was already used for n>p), bernoulli, poisson and neg_binomial_2 families, using compound glm functions previously implemented in Stan math by Matthijs VĆ”kĆ”r
4 Likes

Thanks for the reminder about the importance sampling diagnostics. I should update the news on the website to include the missing items we forgot.

Edit: and we also missed some fixes to bayes_R2 and loo_R2 from @mcol.

Ok Iā€™ve added the missing items to the news page on the website.

1 Like

In addition of optimizing these work for meanfield and `fullrankā€™ ADVI, but so far we have not seen any example where these would be better than optimizing or MCMC.

What exactly do you mean here? Are you saying that there is no advantage to using ADVI vs. optimization or that there isnā€™t any advantage to using PSIS with ADVI sampling?

1 Like

I have not yet seen an example where there would be advantage of using ADVI, if we want the same accuracy as what we can get with MCMC.

  • Small data and small model: MCMC is fast and no need for Laplace (optimizing) or ADVI.
  • Big data and model is such that the posterior is close to Gaussian: MCMC is slow, Laplace (optimizing) is much faster than MCMC and ADVI. ADVI could produce good approximation, but is much slower than Laplace (optimizing).
  • Big data and model is such that the posterior is not close to Gaussian: MCMC is slow, Laplace (optimizing) and ADVI produces bad approximations. ADVI may produce better approximation than Laplace, but is not able to produce the accuracy of MCMC.

Iā€™m happy if someone can show an example, where Laplace (optimizing) is not able to achieve accuracy of MCMC but ADVI achieves the accuracy of MCMC with less computation time than MCMC.

I know that some people have used ADVI to obtain posterior approximation which is far from the true posterior, but which is faster to compute and sufficient for making useful predictions in a specific application. This is different task from what I consider above. In these cases, e.g., cross-validation can be used to check whether predictions are useful, but the only way to check if we could get better predictions with MCMC is to run MCMC (and if it wasnā€™t clear yet, in the above I consider cases, where importance sampling idea can be used to check if the approximation is close enough to the true posterior, so that we donā€™t need to run MCMC).

6 Likes

So in terms of Stanā€™s roadmap, where does VI in general fit in? Or does it have a place in the future?

I just wrote a wrapper for ADVI in CmdStanPy precisely because some people are using ADVI, but I agree completely with Aki (and Bob) that this is not a good thing.

2 Likes

I donā€™t know about official roadmap, so this is just my personal opinion based on some discussion with others. ADVI is labeled as experimental. There is some effort to remove it, so itā€™s likely it will stay there for a long time. There has been a lot of questions about the performance of ADVI and I made the effort to add diagnostics (coming to CmdStan, too), so that at least itā€™s easier to see that itā€™s not competing with MCMC. Having more and better diagnostics for all things in Stan, seems to be generally accepted to be good idea.

There are some things in ADVI which can be improved to get more speed, stability and accuracy, which might make ADVI to be competitive for a small set of models. There might be more elaborate VI methods which would go beyond Gaussian approximations to extend that set of models. Itā€™s unlikely that these VI methods could be much faster than MCMC and still obtain the same accuracy in general case, and thus itā€™s unlikely that VI would have big role in Stan in the near future. Some people are using ADVI in cases where they donā€™t care if itā€™s far from MCMC accuracy, but that path is unlikely to be in Stan roadmap. I think itā€™s good to improve how easy it is to use Stan to provide log density, gradients etc. and let interested people to experiment with external VI algorithms, and maybe someone can come up with something useful.

4 Likes

This is also my experience: extensive tuning was spent to make ADVI ā€œworkā€, if at all, for some of my models. Even then the accuracy is nowhere near sampling. What feels off to me is that among three methods Stan offers ADVI is far less reliable than optimizer and sampler, and when I want to speed up a model with limited number of hyperparameters, INLA comes to mind instead of anything in Stan.

Letā€™s hope nested LA would speed up models that fits the bill.

1 Like

Historically it may help to know that the addition of ADVI was opposed by many developers at the time ā€“ not just because of the lack of validation of the algorithm but also because the code itself wasnā€™t the best quality. Unfortunately there were external factors and conflicts o interested that pressured its inclusion. We have learned from this, however, and are slowly but surely setting up procedures to avoid it in the future.

1 Like

As of this afternoon itā€™s in CmdStanR too, but I agree with all the caveats mentioned about using it.