Additional support for Pathfinder (e.g., helper functions to convert results into MCMC inits) will follow in the next release.

Thank you to everyone who contributed!

Installation

# Install from the Stan R package repository
# We recommend doing this in a fresh R session
install.packages(
"cmdstanr",
repos = c("https://mc-stan.org/r-packages/", getOption("repos"))
)

We use L-BFGS optimization, so any convergence guarantee would have to be of L-BFGS to the mode. Not every distribution even has a mode (e.g., \textrm{beta}(0.5, 0.5) or any hierarchical model), so this first step can fail.

As far as the approximation, it’s a 2nd-order Taylor expansion around the mode, which produces a normal distribution. I’m pretty sure this approximation can be arbitrarily bad. For example, we might have a Cauchy target distribution where the normal is a very bad approximation. Or we might be approximating a skewed distribution with a symmetric one.

Ok, but least the approximating distribution is known, whereas with variational inference the approximating distribution can only be said to minimize some measure of entropy.

In Stan’s VI (both Pathfinder and ADVI), we also use normal approximations on the unconstrained scale. For ADVI, it can be diagonal or dense covariance and for Pathfinder it’s low-rank plus diagonal.

ADVI minimizes (or I should say attempts to minimize) KL-divergence from the approximating distribution to the true distribution, which equally balances maximizing entropy of the approximating distribution and minimizing cross-entropy from approximating to the true distribution. ADVI applies more generally to models that don’t have modes, but if the density has a mode, it’s going to give a very similar result to Laplace approximation and will give exactly the same answer for a normal target distribution.

Without knowing the form of the target density, it’s hard to say much more. Even when we do know the form of the target density, I haven’t seen a lot of theory around either Laplace or VI, but that may just be ignorance on my part.

This looks really great, thank you for all your work!!! A couple of questions. Are there limits on the types of model we can fit with laplace? what kind of speed improvements can we expect?

Fantastic work, really keen to give Pathfinder a go - looking forward to the option to use it for inits, too!

Question: are there plans to put CmdStanR on CRAN? The Stan repo works well, but it can be a little fiddly to get it set up and working when aiming for reproducability, e.g. when working with renv.

It’s been really nice not having it on CRAN (especially given the experience we’ve had with RStan updates getting rejected) but there has been a lot of interest in getting it on CRAN so we’ll reconsider. I’ll make sure we discuss it at one of the next Stan development meetings (probably in January at this point).

What would be nice is to have some kind of stub version that just checks if cmdstan is installed and does nothing otherwise. That would make downstream packages easier to use bc it would require one less step from the user (i.e. they would need to install cmdstan, but not first cmdstanr from a separate repo and then cmdstan).

I know though CRAN is always a pain with C/C++ packages and so it can’t ever be that easy…