Hello. I am working on a hierarchical Bayesian model in a spatial mixed effect setting. Couple of process steps in the hierarchy involve distributions that I know will be computationally challenging ( read not conjugate) and working with data augmentation steps to circumvent them will be computationally expensive. I know STAN will be perfectly suited to handle that part of the hierarchy. I was thus wondering if it is statistically valid to call a STAN sampler update within a Gibbs Sampler. I am thinking like a Metropolis-Hastings in a Gibbs, the STAN will update the parameters in the corresponding process step, but I am not sure if this holds statistically.
Any input as to whether this should be a valid approach and if yes, are there considerations that I need to be aware of? Thank you for your input and time.
You can do this, insofar as any “Metropolis-within-Gibbs” sampler is possible, but there are some things to be aware of:
You have to pay the warm-up cost at each iteration of the Gibbs sampler. Assuming that the warm up is sufficient (checking this is important), any of the post warm-up draws is a valid draw from the target conditional. In practice this means you might set, for each iteration in the Gibbs sampler, iter_warmup = 1000 and iter_sampling = 1. This can be very slow (but not always!).
The temptation to reuse the HMC adaptation info from the previous Gibbs iteration is high, but the target conditionals can change substantially depending on the value of the other parameters. It is safest to rerun the adaptation for each Gibbs iteration, but this too can be very slow.