Divergent transitions in hierarchical logit

Unlikely to help. The MAP is not in the typical set and hence you still need to warmup up in order to converge to the typical set. The hope of this approach is that the MAP is much closer to the typical set then an arbitrary starting point, but that’s unlikely to be true in moderate to high dimensions. Regardless, HMC finds the typical set really fast – typically within O(10) iterations. The bulk of warmup is spent performing initial explorations to inform the adaptation algorithms, and a better starting point doesn’t help that process.

4 Likes