CmdStan & Stan 2.37 release candidate

Edit (June 12):

A second release candidate has been published to address several issues with the embedded Laplace approximation found in the first release candidate.


I am happy to announce that the latest release candidates of CmdStan and Stan are now available on Github!

This release cycle brings the embedded Laplace approximation, a sum-to-zero matrix type, new functions, memory usage optimizations, and many other improvements.

You can find the release candidate for CmdStan here. Instructions for installing are given at the bottom of this post.

Please test the release candidate with your models and report if you experience any problems. We also kindly invite you to test the new features and provide feedback. If you feel some of the new features could be improved or should be changed before the release, please do not hesitate to comment.

The Stan development team appreciates your time and help in making Stan better.

If everything goes according to plan, the 2.37 version will be released in two weeks time TBA.

Below are some of the highlights of the new release.

The embedded Laplace approximation

We’re releasing a suite of functions to perform an embedded Laplace approximation, in a similar flavor to what is done in the INLA and TMB packages. These functions approximate the marginal likelihoods and conditional posteriors that arise in latent Gaussian models. The idea is to integrate out the latent Gaussian variables with a Laplace approximation and then perform standard inference on the hyperparameters. These functions give users a lot of flexibility when specifying a prior covariance matrix and a likelihood function, although the approximation is not guaranteed to work well for an arbitrary likelihood.

We’d like for people to fit their favorite latent Gaussian models with the embedded Laplace approximation, either using some of the built-in likelihoods or by specifying their own likelihood. We want to make sure the documentation provides enough guidance for users to write down their model.

Preliminary documentation for the embedded Laplace approximation is available here. These will continue to be updated during the release cycle, so check back!

Some additional materials are available here, though please note that the syntax has changed since these older materials were written.

Changes for constrained types

This release includes a new constrained type, sum_to_zero_matrix. This is similar to the existing sum_to_zero_vector type but the matrix will sum to zero across each row and column. See the preliminary documentation.

Additionally, the existing simplex and stochastic matrix types have been updated to use a new transformation under the hood, which should allow for more efficient sampling in some cases. See the preliminary documentation.

Built-in constraints exposed as functions

Building on the 2.36 release, which added the jacobian += statement, and _jacobian functions, this release adds functions that expose the implementations of Stan’s built-in constraints to the user.

Each of the built-in transforms now has three corresponding functions:

  • <transform>_constrain - applies the inverse (constraining) transform

  • <transform>_jacobian - applies the inverse (constraining) transform and increments the Jacobian adjustment term accordingly

  • <transform>_unconstrain - applies the (unconstraining) transform

The documentation for these new functions can be previewed here.

Other changes

The default number of significant figures saved by CmdStan has been increased from 6 to 8.

CmdStan now allows the user to specify filenames with a comma-separated list when requesting multiple chains.

The hypergeometric functions (hypergeometric_pFq, as well as specializations for 1F0, 2F1, and 3F2) are now available in the language. Preliminary documentation is available here.

This release fixed an issue where the laplace_sample functionality was using more memory than necessary, and implemented an improvement to the memory usage of the Pathfinder algorithm when PSIS sampling is not requested.

More details on all of the above and more are available in the preliminary release notes

How to install?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release. We also have an online Cmdstan guide available at CmdStan User’s Guide .

If you are using cmdstanpy you can install the release candidate using


cmdstanpy.install_cmdstan(version='2.37.0-rc2')

With CmdStanR you can install the release candidate using


cmdstanr::install_cmdstan(version = "2.37.0-rc2", cores = 4)

Note: for the best experience, you may want to (re-)install your copy of cmdstanr/py from the latest development branches.

11 Likes

I for one welcome our new laplace and constraint overlords…er I mean functions

9 Likes

5 posts were split to a new topic: Differences between jacobian += and target+=?

Thanks for this great release everyone! Your hard work is much appreciated. The new simplex transform reminded me of the log Dirichlet distribution that @spinkney posted here once. Are there any plans of including that in Stan at some point? I am working on some models where I never have to leave the log scale for the simplex.

Hi!

I have played a bit with the laplace approximation, which looks like a huge feature. There is one thing I ran into, which I find odd: There are specialzed functions like

laplace_marginal_poisson_log_lupmf and the like

However, with the general interface it is not possible to pass in the general case using laplace_marginal a likelihood_function function which itself calls a lupmf function (so unnormalized pmfs). The reason is that the very first argument of the likelihood_function must be a real vector which represents the parameters we like to integrate out. However, if the likelihood_function is an unnormalized thing, then I have to call it (as enforced by the parser) to be a lpmf function…and those functions are only allowed to have as first argument an int argument!

In short, the unnormalized densities for discrete distributions cannot be used inside the likelihood_function… but calculating normalization constants is really costly.

Maybe I overlooked something?

Wouldn’t it make more sense to make the vector of parameters to integrate out to be the last argument? Then one could still put into the likelihood function the data to be the first argument as usual…

Best,
Sebastian

This is not really the issue, because you can call a _lupmf from inside a lpdf, so the vector as the first argument is not preventing you. But, if you try this, you’ll find that the parser will instead give you an error like

Function 'll_function_lupdf' does not have a valid signature for use in 'laplace_marginal_tol':
Expected a pure function but got a probability density function.

On a technical level, we could relax this restriction (we’ve already worked out how, in the reduce_sum code). On a mathematical one, I’m not sure if the approximation assumes the density is normalized or not, which would be a question for @charlesm93

Release Candidate 2

Please note that earlier this morning we released an rc2, which resolves several compilation issues and a major runtime issue with the embedded Laplace approximation. If you have been trying out rc1, we highly encourage you to update and try rc2!

3 Likes

Yes, I am also not sure if the Laplace approximation requires a normalised density (I"d hope not and I figured from the presence of laplace lupmf densities that this is the case).

Still, what I must say what is bothering me somewhat is the position of the parameters to marginalize out. Putting these first as argument to the likelihood function violates all conventions in Stan where we put parameters to be last and data first for likelihood functions (aligned with common Bayesian notation). Is this a compromise to get this to work with C++?

Also, wouldn’t it be an option to have theta_0 be a tuple such that a tuple rather than a plain vector is passed to the likelihood function (ideally the likelihood function then just receives the content of the tuple as the first arguments rather than a tuple. This would allow to maintain structure in the way we write likelihood functions rather than having to do the (un)packing business from vectors, which is super tedious.

Something like

laplace_marginal(data y, ll_fun, (more args....), (real param1, real param2), ...)

and then the ll_fun is allowed to be

ll_fun(data y, more args, ..., real param1, real param2)

Apologies if this is super late in the discussion of this design, which I overlooked obviously. This is probably a huge headache to code this in C++… sure… but can we at least make sure that such a design is possible in the future?

Best,
Sebastian

When using Laplace method, most of the time theta is really just one vector parameter. The cases where you would have two conceptually different parameters with joint normal prior are less common and very likely to have non-log-concave likelihood, which makes the Laplace method less likely to work. Also, using tuple would not avoid the need to provide the indeces of which observations are associated to which parameters, and you would still need to do some extra work similar packing/unpacking.

I’m curious of what is your example of likelihood you would like to use for which a tuple would be useful and which would still be almost log-concave so that Laplace method would work?

Yes, you can drop the normalizing constant in the log likelihood with respect to the latent Gaussian \theta.

One slightly confusing point is that often “normalizing constant” is defined wrt to the observations, rather than the model parameters Take the Poisson distribution:

p(y \mid \lambda) = e^{-\lambda} \lambda^y / y!.

With a log link, we’d have \lambda = \exp(\theta). The typical normalizing constant is e^{-\lambda} and clearly we can’t drop it. On the other hand, we can drop 1 / y!. Naturally, in Stan, the unnormalized density is defined as dropping the constant terms wrt the model parameters, not wrt the observations.

For the embedded Laplace, we can go a step further and also drop terms that only depend on the hyperparameters \phi. This is something we should do for the wrappers with specialized likelihoods.

1 Like

Hmmmm… It would be feasible to replace the theta vector argument with a tuple argument, although this would require some refactoring in C++. @stevebronder can comment on how serious this refactoring would be.

Another challenge is that the code needs to figure out the sparsity structure of the log likelihood’s Hessian. Right now, we assume the Hessian is block-diagonal and the user is responsible for providing the size of each block. We can probably work this out if the user provides the indices of which observations are associated with which parameters, as suggested by Aki.

Of course, I would prefer to give the user more flexibility but this might require quite a bit of work.

@avehtari I’m sure you’ve thought about this, but the one example that comes to mind here is the GP motorcycle model.

I think it would require enough work that it definitely won’t happen this release. But it could be an additional signature down the road — especially because we probably wouldn’t want to force you to use a tuple if there is just a single latent normal prior

1 Like

Yes, I was thinking about that. See also Laplace approximation and natural gradient for Gaussian process regression with heteroscedastic student-t model | Statistics and Computing

To keep everyone up to date, there have been a few issues brought up, both in design and implementation, that will need to be addressed before this release is officially published. Thank you to everyone who tried out the current RCs and found/suggested these!

This may take some time due to several members of the Stan team all taking some personal vacations at the same time, so we appreciate everyone’s patience! I will try to post updates here, but interested users may also want to subscribe to the Github tracking issue for the release

1 Like

Hmmm… thinking about this more, I agree with you that most of the time a vector of parameters is good enough. I have population PK models in mind, which seem to not work so well with Laplace - at least this is what I got from @charlesm93 prototype which he tried. Still, I was hoping that some models like this would eventually work well with Laplace. For those models having more structure to use (like with tuples) would be great.

Are there detailed case studies out there already which discuss a GP? I would expect that for a GP the Laplace approximation is a decent thing to do.

However, what I still struggling with is the convention to put the parameters as first argument to the likelihood function from the user. Would be difficult to make it the last parameters? Certainly this Laplace work should not get delayed too much as it looks great.

Best,
Sebastian

Are there detailed case studies out there already which discuss a GP?

Yes. There is this old case study, which I’ll update once we finalize the embedded Laplace’s api. I also edited the section on GPs in the user manual to provide examples on how to use the embedded Laplace approximation.

I agree it would be great to get embedded Laplace working on PK models. I’m wondering: do software like Monolix do something akin to an embedded Laplace approximation in order to marginalize out patient-level parameters?

However, what I still struggling with is the convention to put the parameters as first argument to the likelihood function from the user. Would be difficult to make it the last parameters?

I think the challenge here is that the non-parameter arguments are passed as variadic arguments. We could replace those variadic arguments with a tuple, and then have the tuple passed first.

To my knowledge Monolix does only implement SAEM, but Nonmen is offering the Laplace method, see https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12422 . The Laplace method is not the default method in Nonmen, which is to my memory FOCE; but Laplace is described to be very similar to FOCE. FOCE ( first order conditional estimation ) avoids the needs for second order derivatives… so could possibly also be of interest to Stan given that FOCE is driving many if not the majority of popPK/PD analyses whenever these are Nonmen based. An embedded FOCE method sounds just as cool as Laplace to me. Before jumping at this with Stan one could look at comparisons of Laplace and FOCE in Nonmen first - I am pretty sure there are many bechmarks out there. People love to compare Nonmen methods and publish that.

To me a tuple sounds fine to use here if that is what brings us back to the convention that for likelihoods the data argument comes first, then the parameters and then the rest. Nonetheless, I don’t intend to delay this feature massively and so far I appear to be the only person raising my voice in this regard.