Sampling from the prior - why am I seeing divergent transitions?

betanalpha · December 6, 2021, 8:32pm

The infamous gamma(0.5, 0.5) is commonly recommended as a “non-informative” prior model but upon closer inspection isn’t actually pretty nasty.

On the nominal scale the gamma(0.5, 0.5) density function looks something like

Recall that Stan explores an unconstrained space, in this case log(y). The corresponding probability density function for log(y) is (I think I got the Jacobian correction right here…)

Because the peak is so far away from 0, around which Stan’s sampler is initialized by default, it will take a long time for the sampler to get to the peak. If it doesn’t get there fast enough then Stan’s adaptation will be informed by tail behavior and not the behavior within the typical set; once the sampler finally reaches the peak that stale adaptation can be ill-suited to the geometry there and cause divergences. The asymmetric shape of the density function around that peak can also stress the adaptation.

When constructing posterior distributions the realized likelihood function will often regularize much of this behavior. That should be no excuse, however, for such an awkward prior model.

Topic		Replies	Views
Divergent transitions Modeling	12	1093	July 17, 2019
Choosing correct non-centered parametrization Modeling techniques , specification	9	867	October 2, 2020
Divergent transitions after warmup to be sloved Modeling rstan , techniques , fitting-issues , performance , math	9	2300	February 7, 2021
Divergences in a non-centered computational model Modeling fitting-issues	21	1558	October 30, 2019
Divergent transitions with hierarchical model Modeling	5	793	July 15, 2019

Sampling from the prior - why am I seeing divergent transitions?

Related topics