Sqrt(square()) vs. fabs()

tlyim · April 29, 2019, 3:15pm

just a quick question about what’s more efficient for Stan.

Is it right to think that sqrt(square()) as a composite function of two smooth functions should work better in Stan (which is gradient-based) than fabs(), which has a kink at zero?

bgoodri · April 29, 2019, 6:00pm

Neither is differentiable at zero and both can cause problems for HMC. If x is a real number such that y = x^2 and z = \sqrt{y}, then \frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \times \frac{\partial y}{\partial x} = \frac{1}{2y} \times 2x, which is not well-defined when x = 0 = y.

If you cannot restrict x to be either positive or negative, then sometimes you can get away with a fabs. Other times, you have to do a mixture model with a positive and a negative contribution to x.

tlyim · April 29, 2019, 8:20pm

Thanks for the clear explanation. Much appreciated.

Bob_Carpenter · April 29, 2019, 8:34pm

The usual problem with naive application of abs(x) is that it immediately leads to multimodal posteriors because +a and -a produce the same result.

So you don’t want to do this:

parameters {
  real x;
...
  f(abs(x),...)

Much better to do

paramters {
  real<lower = 0> x;
...
  f(x, ...)

If it’s a compound and not just a variable, you can often work out constraints on the components.

tlyim · April 29, 2019, 8:47pm

Thanks for the further clarification. I ran into that consideration when exploring different ways to formulate my model. Now it’s settled on not using fabs() in the sampling code but still using it for generating simulated data (by referencing to the mean(fabs()) of the latent state, which is a real number in my case). This use for simulation should be fine, I think.

Bob_Carpenter · April 29, 2019, 8:56pm

You get the same result simulating from a normal and taking the absolute value as simulating from a half-normal directly.

Generally, you want to set up simulation code to match the model code so that your’e testing code generated from the model.

Maybe you could say more about why you are using absolute values.

tlyim · May 2, 2019, 1:58am

Thanks for asking. I am building a model where the extent of misreporting m is a product of (i) a base magnitude (assumed to equal the average of the absolute profit figure, as profit can be negative) with (ii) the potential room of manipulation constrained by certain governance mechanisms (G) and exploited by the bias effort resultng from the temption to misreport (X). I outlined the model in this post (full sampling code uploaded there).

The simulation code is here: simu_SSM2.2.stan (1.4 KB)

If you have time to take a look, would appreciate very much your feedback on any error discovered or advice on improving the model. Thanks.

Topic		Replies	Views
Breaking gradients with fabs and the "double monomial" Modeling specification	16	847	July 23, 2021
Using Stan HMC as Metropolis-Within-Gibbs step in C++ Interfaces	2	836	March 22, 2020
Alternative to Absolute Value Function Modeling	4	1725	May 24, 2018
Fine tuning for polynomial Posteriors Modeling performance	1	38	December 3, 2024
Emergency vectorization fix Developers	2	497	November 20, 2016

Sqrt(square()) vs. fabs()

Related topics