Combining HMC with Gibbs Sampling in Hierarchical Models

This discussion is motivated by Bayesian Data Analysis, a conversation I had with Trivellore Raghunathan (Raghu), and a thread from April 2013 from that Stan Users Google group on “Reasons to switch from BUGS to Stan”.
From BDA

There are two ways in which ideas of the Gibbs sampler fit into Hamiltonian Monte Carlo.
First, it can make sense to partition variables into blocks, either to simplify computation or to speed convergence. Consider a hierarchical model with J groups, …. In this case, … it may be more effective—in computation speed or convergence—to cycle through J + 1 updating steps, altering each η(j) and then φ during each cycle. Parameter expansion can be used to facilitate quicker mixing through the joint distribution.

BDA then goes on to say that the Gibbs sampler can also be used with HMC for updating of discrete variables.

My conversation with Raghu suggested that Stan actually uses partitioning of variables into blocks for fitting complicated models.

My first question is does this partitioning actually occur in Stan or does Stan actually update the entire parameter vector at once?

My second question is whether what BDA suggests even possible in Stan?

Finally, as BDA suggests, parameter expansion/marginal augmentation has been used in hierarchical models to speed convergence. BDA suggests that Gibbs sampling with parameter expansion works fine for these models. In the on “Reasons to switch from BUGS to Stan” thread, Bob Carpenter stated that a student had been recruited to do comparisons. Are there any results from these comparisons? I have not been able to find them.


At once

Not currently

Not that I know of.

We did some preliminary work on those tests but they quickly became infeasible because after comparing to Stan we realized that BUGS wasn’t giving the right answer most of the time. Without the right answer speed is irrelevant. We have since learned a tremendous amount about fitting hierarchical models and the potential dangers there in but instead of trying to rewrite all of the BUGS models to be fit-able and hence comparable to Stan we have just moved on to focusing entirely on Stan. There are references in the manual.

Any kind of blocking will compromise performance and hence is not recommended. Just build your model in Stan jointly and let it do its thing.


Thanks! That’s what I thought from my readings of various papers and the manual. I just got a little confused when I saw that passage in BDA.


We have learned a lot since BDA3 was published. Which is testament to the Stan user community pushing us towards interesting yet relevant problems and the general awesomeness of the Stan development team and our collaborators.