Hi all,
I am new to stan and have a basic question.
I want to revise the Metropolis-within-Gibbs sampler to something like the NUTs-within-Gibbs sampler by Rstan. Basically, we have a big Gibbs sampler to sample each parameter block on each iteration. For a few parameters whose priors are not conjugate, we originally applies Metropolis, so implement a Metropolis-within-Gibbs sampler in R, but now we want to replace those Metropolis steps by NUTs using stan. I just want to ask if it is possible for stan to give one posterior sample for those parameters on each iteration and output to R? Or to use stan, we must implement the whole algorithm by Rstan, to give up the Gibbs sampler? Thanks so much for any suggestions!
Once every year or so, I think about this. I’ve never done anything serious about it though. Suppose \theta are the parameters for NUTS and \phi are those for RWMH/Gibbs. No, Stan can’t do one or more proposals from P(\theta | \phi, X) and then pass this to some external program while holding its internal state in stasis until it gets proposals from P(\phi | \theta, X) passed back. And no, you don’t have to give up. The impression you get from very clever people is that you should code your own bespoke sampler as a fun side hustle, preferably in some esoteric language, like, you know, Assembly. Or Scratch. But I think it is possible to get it running as long as you can code up P(\theta | \phi, X) in Stan and P(\phi | \theta, X) in whatever (JAGS?), and accept two big compromises: that each step will involve re-adapting, warming up and running a good few thousand iterations, not just one at a time (neither program will wait for the other, and this likely means that you would be quicker to just go back to JAGS :-( ), and also that the coding of each will have to be complex enough to accommodate flexible likelihoods, priors and initial values as you move through conditional slices of the joint posterior, and change your mind about that joint shape. Finally, there is really no theory as far as I know about interpreting the Frankenstein posterior sample that would result. It might all hang together OK, but then, it might not. You would certainly lose accurate information on correlations between elements of \theta and elements of \phi, along with good diagnostics and LOO-CV, because the entire joint posterior is only as good as number of times you flipped between NUTS and RWMH/Gibbs. It is an interesting thought experiment but it probably stops there.
I’d be interested to hear if I’m talking absolute nonsense in others’ opinions. Like I say, I’ve never tried it.
You might want to look into this thread, by @s.maskell.
You may want to check out Turing, which is written in a high-level programming language (Julia) and would be much more amenable to this. It already has Gibbs sampling, and adding NUTS-within-Gibbs shouldn’t be all that hard if you want to make a pull request.