I thought it might be interesting to draw your attention to some work we’ve done to integrate NUTS inside a Sequential Monte Carlo (SMC) sampler. See here:
We are working to integrate this into a future version of Stan. Comments welcome!
Cheers
Simon
PS We see this paper is a key component of our ongoing work to improve the ability to use SMC (as an alternative to MCMC) inside Stan. We hope and anticipate that this broader work will culminate in a version of Stan that, by using SMC, can fully exploit modern computing resources. We hope and anticipate that this will result in significant speed-ups and thereby increase the community’s ambition with respect to the complexity of models that Stan can consider.
While brushing up on SMC in anticipation of using your work when it’s available, I came across this talk which at the end suggests that Junpeng Lau added a NUTS proposal to the SMC sampler in TFP; happen to have taken a look at that? If so, how would you characterize any differences from your approach?
Not sure. We need to be careful to not confuse sequential inference (as tackled by particle filters), batch inference (as tackled by Stan), families of algorithms (eg SMC methods) and specifics (eg SMC samplers and particle filters): it looks like your link points to a particle filter. We’re writing another paper on decent particle filters (eg to underpin a future “Streaming-Stan”) using NUTS but that’s not quite finished yet.
Ah! I didn’t realize that SMC and “particle filter” weren’t synonyms
It’s confusing and slightly unfortunate that multiple techniques have very similar names.
We are working on both an ability to add SMC samplers to Stan (with the aim of making Stan run faster in the context of problems it currently tackles) and to develop a version of Stan, Streaming-Stan, that uses particle filters (with the aim of enabling Stan to be applied to sequential inference in the context of processing never-ending streams of data). The work on NUTS in the arxiv paper is described in the context of an SMC sampler, but we are also developing a high-performance particle filter that uses NUTS (for Streaming-Stan). Hope that makes sense!
It is confusing. SMC-NUTS, as described in the paper, is focused on inference of constant parameters (eg the MCMC bit of pMCMC, assuming you can address the non-deterministic nature of the likelihood evaluations that come out of a particle filter).
We have also been working on NUTS for particle filters (aka sequential importance sampling for time-varying variables) and on pMCMC, but that will be described in other papers that we are currently working up.
For the state-space particle filtering, we are developing Streaming-Stan (see here for an articulation of our now-historic progress: StanCon 2020. Talk 5: Simon Maskell. Towards an Interface for Streaming Stan - YouTube) with a view to it being input two Stan files: one describes p(x_1,y_1) with x_1 as a parameter and y_1 as data; the other describes p(x_k,y_k|x_{k-1}) with x_k as a parameter and y_k and x_{k-1} as data. You then repeatedly call what we call Streaming-Stan with pseudocode that looks something like this (where x and w are respectively the samples and their associated importance weights): Model.Initializemodel("oneStanFile.stan", "anotherStanFile.stan") y = getdata() [x w] = Model.SampleInitial(y) While(1) y = getdata() [x w] = Model.SampleNext(x,w,y) End While
The implementation we have developed builds on the recent ArXiV paper and is in the process of being tested. We do plan to release to the world, but need to finalise the testing and write-up of some of the innovative components first.
Thank you so much for the explanation. It’s much clearer now. And I have to say, I’m quite interested in this research. At the moment, I’m using the frequentist counterpart of pMCMC, iterated filtering. I’d like to see the random-walk behaviour eliminated from those methods.
You will probably be keen to know that we want to have PMCMC working with Stan describing p(\theta) and then Streaming-Stan running with NUTS and describing p(x_1,y_1|\theta) and p(x_k,y_k|x_{k-1}). That requires Stan to call Stan which is currently proving tricky. We will get there!