I just wanted to point people to this super useful notebook and gist from Ricardo Vieira, one of the PyMC developers, who outlined the relation between how PyMC and Stan specify models, including a lot of tricks for getting Stan-like flexibility into the graphical-modeling-centric world of PyMC.
It spun out of a bunch of a discussion on the PyMC discourse:
The idea of using different backends but keeping the language is cool. The GPU stuff in Jax is great and enables huge programs. The variational inference stuff and normalizing flows can really expand Bayesian modeling to big data.
When I look at what I want today for any size data, I think it’s useful to look at the strengths and issues we currently have with Stan and other PPLs. One of these days I’ll wire-frame what my current ideal PPL would look like. Probably something closer to slic-stan but with a few additions:
Stuff that is probably do-able today:
Turning AD on/off for some things, user defined derivatives, and higher order AD.
Allow the user to AD stuff to get estimates. When using an extended Kalman filter the jacobian is needed, instead of hand supplying it just call a jacobian function. Also “score” models (see Informant (statistics) - Wikipedia) that take in the derivative of the log-likelihood into account. Similar to what our ODE does pausing AD and then using that info again in the tape.
Easy automatic optimizations like what PyTensor and Aesara do. Turn log(1 + x) into log1p(x), etc.
Stuff that needs more research
Automatic reparameterization
Much, much, much faster usage of posterior draws. Like approximations that only save a fraction of the data and are about 95% accurate. The hard problem here is the joint distribution of the parameters. Univariate stuff is easy but condensing the joint movements of variables is hard. However, if it is found then, in one line, tell the program to calculate using the full draws. The idea is to iterate super fast.
A Stan-like syntax to construct programs/models over the draws. Right now I hate that it is super awkward to do matrix math and stuff over the draws. Make this easy!
Composing Stan models. Using posterior fits as priors. Maybe using message passing like what rxinfer.jl does but combining it with some HMC inference? I don’t know how to do this but would be cool to have.