Implementing and evaluating a new inference algorithm

Hey STAN devs,

My collaborators and I have developed and written up some theory for a new family of approximate-inference algorithms. All of our testing so far has been in Python on toy problems, and now we’re looking for the best way to benchmark it on more realistic problems and (hopefully) share it with the community. Adding our algorithm to STAN seems like a great way to do both, but after a few days of looking at the STAN source, I have to admit that I’m pretty lost. Part of the problem is that none of us has recent or extensive C++ experience, and an initial test suggests that the latency PySTAN make a python wrapper impracticable.

I see this thread that some big refactoring of the core inference code might be on its way. I’m looking for some guidance from those of you more in-the-know: how hard would it be to add our algorithm (by someone with more STAN experience)? Should we wait until after the refactor, and what is it’s timeline?

Some important details: our proposed algorithm is essentially a combination of MCMC and ADVI (and thus doesn’t neatly fit inside any of the existing class hierarchies, as far as I can tell). We apply mostly out-of-the-box HMC or NUTS, but we apply them to the variational parameters and treat the resulting samples as a nonparametric mixture of variational components. We have a preprint here describing the idea, if you’re curious. For now, we just want to start a conversation, picking your collective brains about how best to proceed.

Looking forward to any insights you all are able to share! I would also be happy to chat privately with anyone who is interested and able to help out.


I’m going to ping @stevebronder on this, who has recently been working on adding Pathfinder to Stan. The diff of that may be useful to take a look at (I think this is the most recent branch, but I could be wrong). He may be able to give you some pointers

I believe ““all”” you need to do is implement the algorithm in stan::services, assuming you are able to build off of the existing model class used by the other Stan algorithms. To actually call it anywhere, you’ll need a version of CmdStan which calls that service. This recent blog series by @jtimonen may be helpful understanding more of that: Understanding the Stan codebase - Part 1: Finding an entry point | Juho Timonen

If you’re just interested in implementing your algorithm for more testing, you can fork those repositories and get started. If you’d like to add the algorithm to Stan in a more “official” or permanent way, it is good to start by submitting a design doc (think of these like RFCs)


+1 to what @WardBrian said.

You could add a new algorithm in stan::services (on a branch) and link that in an interface and you’ll start working.

For testing, search for the work on PosteriorDB on these forums. It’ll get you to an effort to have a library of models and inferences to compare results against.


Thanks @WardBrian and @syclik! The blog post and pointing me to stan::services helped get me out of the rabbit hole I was in before, but into another.

assuming you are able to build off of the existing model class used by the other Stan algorithms.

There appears to be pretty tight integration between HMC and the Model parameters, so getting HMC to run in a different space will require some hacking. The most natural place to bootstrap in our algorithm to the existing code, then, seems to be subclassing stan::model::model_base, tricking HMC into treating the variational parameters as if they were the model parameters, and providing a custom set of log_prob functions… unless this would break things like autodiff? There are a lot of functions in model_base that need overriding, and some of the comments are pretty cryptic to me (e.g. would my mock parameter space be ‘constrained’ or ‘unconstrained’?).

Would anybody be willing to talk through some of these Qs in a short call?

Yup. Dm me.

1 Like

The best suggestion I have there is to evaluate behavior on the examples with reference posterior draws in posteriordb. There’s a fair bit of variety in the models, but they’re still all relatively low dimensional. You can see how we did that for our Pathfinder VI algorithm in this preprint and see a few of the higher dimensional models we evaluated. It’s a lot of work to do large-scale evals because the results are stochastic (due to random seeds) and we’re still not sure how much more the other devs are going to push back on evals before letting us merge to the develop branch. You might also find our approach interesting because the multi-path version uses a mixture of multivariate normals as the approximating family.

1 Like