I have a model that takes the form of a moderately large C++ program (about 10K lines of source code). The model has a number of unknown parameters, and it takes those parameters and produces a bunch of outputs that I can compare to an observational dataset in the usual way. Clearly I’d like to use a Markov Chain Monte Carlo, but I’d like to do something better than using the crusty old Metropolis-Hastings code I wrote in graduate school and have been carting around from job to job for the last 20 years. Can Stan provide a better solution?
My first impression, after skimming the documentation, is that it cannot, because there is no good way to get derivatives of the model outputs with respect to the parameters. (Retrofitting the necessary data structures for Stan’s autodifferentiation into 10K lines of code seems like an invitation to madness.)
However, while I was poking around learning about Stan I came across this talk:
At about 3:26 in the talk the speaker has a slide titled “Who is using Stan?”, and on that slide he says
Stan is used for fitting climate models, …
That’s exactly what I’m trying to do (the model I described above is in fact a simple climate model). So, my real question is, does anybody know what work the speaker is referring to? Who is using Stan to fit climate models, and are there any examples I could use as a jumping off point to figure out how my colleagues and I might be able to use Stan to fit our model (and finally retire the 20-year old Metropolis code!)
Can’t say exactly what the company is or does, but hopefully it is within the realm of possibility for you to modify your C++ program to add template parameters (if you don’t have them already) so that your posterior kernel evaluation function can be called with Stan’s types. Then, Stan will be able to use autodiff with your modified C++ code and a little bit of documented hacks.
Does “the company” refer to the people who are using Stan with the climate models? I.e., it’s not NCAR or someplace like that?
Anyhow, what it sounds like you’re saying is that retrofitting Stan’s data structures into the model source is pretty much the only way to make it work. Is that an accurate summary?
There may be plenty of people using Stan for climate models that I don’t know of (I’m not in that field) but there is a company using Stan for such things that I do know of but can’t talk about.
Anyway, it may be less work to just write everything in the Stan language (and that usually takes less time than people fear), but I was saying the opposite: that it may be less work to just add
to a bunch of your existing C++ functions, change the signatures of those functions to use
T rather than
double, and utilize our mechanism to find and use external C++ code from Stan programs. I think some people in astronomy go this route.
How complicated are we talking here? You must already have a function that evaluates the logarithm of the posterior kernel in order to do M-H. And that function is a method of some ugly C++ class that holds a bunch of data? Or that function calls a lot of custom numerical libraries to do stuff that cannot be done analytically?
Here’s the repository for the model I’m talking about, if anyone reading the discussion is curious (or if they’re looking for a good climate model).
The process of computing the likelihood a little complicated because the model was designed as a stand-alone, and the MC was grafted on as an afterthought. The simplified version is that the core of the model is made up of a bunch of components that do things like tracking the movement of carbon between various pools, simulating diffusion in the ocean, tracking the concentrations of non-CO2 GHGs, and so on. As you surmise, there are a lot of numerical solvers and whatnot involved in that process. At the end, what you have is a bunch of vectors of output data, and the function that calculates the likelihood harvests these and compares them to the observational data.
One further wrinkle is that in order to keep the model modular, each component runs independently at each time step, storing its results in vectors that it owns. If a component needs a value from another component (e.g., at each time step the ocean component will need the CO2 concentration from the atmosphere), then it sends a message to the coupler, which routes that message to the appropriate component, which in turn responds with the requested value. I worry that such indirection will confuse the autodifferentiation scheme. Can it handle the case where a function picks up a stored value that was computed in a different function (typically not even in the same translation unit)?
That’s often the case in a lot of big scientific models.
Stan can use such a thing standalone with a little work on the C++ side if you can also compute derivatives. Otherwise, it has to be rewritten to use Stan’s math library so it can do autodiff.
We have solvers for ODEs, etc., but they may be computationally challenging to compute and/or differentiate at large scales.
Yes, as long as it’s stored as it’s proper autodiff type.
I see. I suspect that our ODEs don’t qualify as “large scale”. We’re not spatially resolved like an Earth System Model, so we’re really only tracking a dozen or so variables in the largest systems of equations, and only at annual resolution.
Thanks for all the information. I think I have enough to float the idea of converting to the rest of the development team. If I can get them to go along with it, I’ll be sure to let you know how it turns out.
You might want to have a look at the BayesianTools package in R which provide various MCMC-like samplers for complex models where the likelihood cannot be written down in a simple function.
So you mean something like ABC where you assume you can sample, but assume you can’t evaluate the log density?
If you have the likelihood, you can use Metropolis or an ensemble method and there are programs that let you plug those in. emcee in Python implements an ensemble method. The problem with both Metropolis and the ensemble methods is that they don’t scale well with dimension.
I am also very interested in this concept. We have a C++ program that estimates parameters. Currently we also have a 20 year old metropolis Hastings algorithm. This program I refer to can return the gradient of the likelihood surface for a given value of the parameters. Is there any examples or documentation that could help a developer call STAN routines in our model to use the NUTS and HMC algorithms?
At best, you need to do it the other way around: Use call your likelihood function and gradient from within a Stan program.
@Cyril Like @bgoodri said you’ll want to move the probabilistic parts of what you’re doing to a Stan. I doubt that’s a limiting factor though. There’s some external C++ stuff you can hook up in Rstan that makes it not too difficult to do weird stuff like this (here’s an example of me doin’ it: https://cdn.rawgit.com/stan-dev/stancon_talks/927ac4ad/2018/Contributed-Talks/09_bales/rus.html )
It’s not too bad to fork your own version of Cmdstan and hack in whatever software stuff you need. You’re really hacking stuff into the Math library which is probly the most accessible part of the Stan ecosystem. So to do that you’d want to first start by reading the Stan Math paper (https://arxiv.org/abs/1509.07164) and experimenting with the math library to see if you understand how things work (https://github.com/stan-dev/math).
forward_pde PR is for this kind of task
assuming your model solver can do sensitivity analysis, as @bgoodri and @Bob_Carpenter pointed out. Otherwise you’ll have to insert such a functionality in your solver. Other than autodiff, there are four common approaches to get the sensitivity for your model:
- forward sensitivity
- adjoint sensitivity
- finite difference
- complex step derivative approximation
Depends on model size, parameter size, implementation, etc. I’ll demo these things in stancon(hopefully).
@rpl, Hector is actually pretty light-weight in terms of computing, as the whole point of Hector is to be reasonably small and efficient, right? To adapt it to Stan most of the work I see is to modify
runge_kutta_dopri5 function in solvers such as carbon cycle solver to take in Stan’s variable instead of double, and propagate type changes accordingly. It will not be an impossible task: the bless is you’re already using boost. One of Stan’s ODE solvers uses
Since you’re already getting gradient, maybe it’s easier to move entirely into Stan, instead of only adopting NUTS. This involves use your solver as an external function, and let Stan parser know the signature of the function. The
forward_pde PR above is to facilitate this kind of task.
@seantalts and I are going to be putting together a half-day tutorial on how to do this for StanCon, so expect to see the case study on how to do it by the beginning of September at the latest.
I look forward to seeing this, thanks.