The infamous gamma(0.5, 0.5) is commonly recommended as a “non-informative” prior model but upon closer inspection isn’t actually pretty nasty.
On the nominal scale the gamma(0.5, 0.5) density function looks something like
Recall that Stan explores an unconstrained space, in this case log(y). The corresponding probability density function for log(y) is (I think I got the Jacobian correction right here…)
Because the peak is so far away from 0, around which Stan’s sampler is initialized by default, it will take a long time for the sampler to get to the peak. If it doesn’t get there fast enough then Stan’s adaptation will be informed by tail behavior and not the behavior within the typical set; once the sampler finally reaches the peak that stale adaptation can be ill-suited to the geometry there and cause divergences. The asymmetric shape of the density function around that peak can also stress the adaptation.
When constructing posterior distributions the realized likelihood function will often regularize much of this behavior. That should be no excuse, however, for such an awkward prior model.

