Defining a target distribution using Python model output

I’ve written a biochemistry model in Python, and I’d like to perform a Monte Carlo experiment using HMC to constrain the parameters of that model to those that produce realistic-looking model output. We’ve figured out how to to assign a value from 0 to 1 to every point in parameter space based on how “realistic” the corresponding model output looks, but calculating that value requires running the model’s Python code (which in turn numerically integrates a complicated system of ODEs).

After searching the forum, reading the user’s guide, and watching this video, I’ve concluded that it isn’t possible to do this with Stan. Am I mistaken? Is there a different software package that you think would work? This is my first time working with MCMC methods, and any advice would be greatly appreciated.

Yeah, you’d have to rewrite your simulations in C to use Stan. If you want to re-use your python code I’d look at PyMC3 and possibly tensorflow. If you look at the latter, report back on whether it is possible to call non-tf python code; I think tensorflow’s python-API-atop-C-core architecture might leave you in a similar place as Stan with respect to using non-tf python code (or, @charlesm93: know off-hand if TF can use non-TF python in the target?)

1 Like

When I was looking at the other options out there, TensorFlow seemed like it might work. The argument describing the target distribution in tfp.mcmc.HamiltonianMonteCarlo is specified in the API documentation as a “Python callable”.

I think there exist tools that will compile python code to c++. I don’t know much about them, and I don’t know if they produce great c++ code, but if your goal is just to have a c++ implementation that Stan can work with (and not necessarily a well-optimized c++implementation), there might be some possibility of doing it automatically.


TensorFlow Probability would be a good option, although the interface experience isn’t the same as with Stan. It definitely took a bit of getting used to. I’m not saying it’s better or worse, just different. With TFP, you need to make sure your Python code is differentiable if you use a gradient-based algorithm. So something like vanilla Numpy would not work; on the other hand Jax, which seamlessly supports many Numpy functions and has autodiff, would. You can write your function using TensorFlow, but I find Jax easier to use.

Big caveat: Jax’s ODE support is somewhat limited. You only have access to a non-stiff integrator with adjoint differentiation. Stan supports stiff and non-stiff solvers, and both forward and adjoint differentiation. There’s a PR for a non-stiff solver, which I expect will get in anytime (might already be available).

Bigger caveat: Jax’s ODE doesn’t return an error message when it fails. See No error message when `odeint` fails · Issue #7326 · google/jax · GitHub.

I haven’t tried PyMC3. I think it allows you to specify your code in Python, but you probably need to use their autodiff library. I’m not sure what their ODE support is.