Fitting external models with Stan

Hey,
I was wondering whether it is possible to use external models with Stan. I have set up a conceptual groundwater flow model in python and would like to calibrate the parameters. On github I noticed some MCMC packages that implement NUTS requiring the user to only define a function to calculate the log posterior and the gradients.

However, I as I understand the No-U-Turn sampler implemented in Stan has received many improvements since the paper in 2014 by Hoffman and Gelman. So, it would be great if I can do something similar in Stan.
Thanks, Douwe

You could imagine doing something like @andrjohns’s StanEstimators package for R, but it would be tricky to write and probably quite slow.

For the majority of models, log density and gradient evaluations are the dominating part of the computation. If those are in Python, it probably wouldn’t hurt you that badly to use a sampler in Python on top. You could look into nutpie or blackjax

Technically, the answer is ‘yes’, but practically speaking, I’d say ‘no’. To get it to work, you have to write a C++ definition of the model and include that as a new density for Stan for you to be able to use Stan’s algorithms.

I’ve had very good luck with nutpie, which @WardBrian linked, so I’d recommend that. The PyMC Labs devs are using it in production for clients, from what I understand.

I read in the documentation that PyMC transcodes models to C and compiles them as machine code to enhance efficiency. I think this may cause problems in my case.

I (foolishly) did not mention this before, as I did not want to bore you with a long post, but my groundwater flow model is created using FloPy, a python package that works as a sort of interface for MODFLOW6. MODFLOW6 is a program written in Fortran that allows you to create groundwater flow models and perform simulations with them. FloPy allows me to set-up, run and post-process the results of MODFLOW models. In my case I post-process the results into log-likelihood values for calibrating the MODFLOW model.

So although nutpie looks very promising, I’m afraid it may not be applicable for me.

Compiling to C is not necessary (though clearly would be faster).

I think fortran should also be no problem (as long as you can differentiate the log probability function). You will need to use the under-documented nutpie.compiled_pyfunc.from_pyfunc function, however.

Here is an example where I implement the likelihood in C (just a 1-d standard normal):

@aseyboldt may be able to provide better guidance than what I hacked together in that example. You may also want to ask on the PyMC forums

The example for from_pyfunc looks good to me. That function isn’t really public at this point, and there might be some very minor changes in the next release of nutpie, where it should finally get a bit of documentation and a stable API.

If you have a C function, there are also ways of passing those to nutpie directly, to avoid python overhead for the logp function evaluations. This is used for pymc models with the numba backend for instance. But unfortunately that is not public API at this point either…

But from what I know about MODFLOW I don’t think you have to worry about a little bit of python overhead. I don’t think it can natively compute gradients? Maybe there is a way to use enzyme to get those, but that sounds like a big and uncertain project to me. So as far as I’m aware the only way would be to use finite differences to get at the gradient. So if we optimistically assume 10000 gradient evaluations to run nuts, a model evaluation time of 10s and 20 parameters, that would get us a runtime of about 20 days. I hope some of my assumptions are incorrect…
Maybe you could parallelize the different finite differences, but still…

1 Like

Most of your assumptions are correct. I do indeed intend to use finite differences to compute the gradients and I will try to parallelize this to speed it up. Thankfully, I have a relatively small MODFLOW model that only takes approximately 0.1 sec to run, so the run time should be a lot shorter than 20 days. I will give nutpie a crack :)

Thanks for al the help!

1 Like