Breaking gradients with fabs and the "double monomial"

betanalpha · July 23, 2021, 8:01pm

It can be both, depending on the severity of the discontinuity and how the parameters configure the shape of the density function.

Even though the discontinuity appears to be one-dimensional in the plots of the observational density it will manifest in a much more complicated discontinuity likelihood function over the parameter space, where the latent parameters may have to move in correlated ways to cross the peak. Each observation will also introduce it’s own discontinuity here the peak passes the observation, and so the total posterior will have lots of discontinuities. This can not only limit how quickly the sampler can move from one part of the typical set to another, it can also prevent sufficient exploring at all.

SBC can definitely help, although if the bias is small the SBC deviations might be hard to resolve without lots of simulations.

I tried reasoning through some behaviors that you could check against but I can’t convince myself if they’d actually work or not. Mathematically the discontinuities will always be there, but the posterior distribution might concentrate sufficiently far away from from the discontinuities that their effects are negligible (like suppressing undesired modes with a strong prior model --they never actually go away, you just make them small enough to be negligible). The challenge is figuring out where the discontinues in the parameter space arise and then comparing that to the posterior, which is nontrivial.

Topic		Replies	Views
Sqrt(square()) vs. fabs() General	6	2725	May 2, 2019
Emergency vectorization fix Developers	2	497	November 20, 2016
Known gradient breaking behaviours? Developers	8	614	July 30, 2019
Cook et al spike at 0 Modeling techniques	64	3861	August 11, 2017
Tutorial: Using R's Distribution Functions in Stan Models (external C++) General	2	189	July 31, 2024

Breaking gradients with fabs and the "double monomial"

Related topics