Sqrt(square()) vs. fabs()

just a quick question about what’s more efficient for Stan.

Is it right to think that sqrt(square()) as a composite function of two smooth functions should work better in Stan (which is gradient-based) than fabs(), which has a kink at zero?

Neither is differentiable at zero and both can cause problems for HMC. If x is a real number such that y = x^2 and z = \sqrt{y}, then \frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \times \frac{\partial y}{\partial x} = \frac{1}{2y} \times 2x, which is not well-defined when x = 0 = y.

If you cannot restrict x to be either positive or negative, then sometimes you can get away with a fabs. Other times, you have to do a mixture model with a positive and a negative contribution to x.

1 Like

Thanks for the clear explanation. Much appreciated.

The usual problem with naive application of abs(x) is that it immediately leads to multimodal posteriors because +a and -a produce the same result.

So you don’t want to do this:

parameters {
  real x;
...
  f(abs(x),...)  

Much better to do

paramters {
  real<lower = 0> x;
...
  f(x, ...)

If it’s a compound and not just a variable, you can often work out constraints on the components.

1 Like

Thanks for the further clarification. I ran into that consideration when exploring different ways to formulate my model. Now it’s settled on not using fabs() in the sampling code but still using it for generating simulated data (by referencing to the mean(fabs()) of the latent state, which is a real number in my case). This use for simulation should be fine, I think.

You get the same result simulating from a normal and taking the absolute value as simulating from a half-normal directly.

Generally, you want to set up simulation code to match the model code so that your’e testing code generated from the model.

Maybe you could say more about why you are using absolute values.

1 Like

Thanks for asking. I am building a model where the extent of misreporting m is a product of (i) a base magnitude (assumed to equal the average of the absolute profit figure, as profit can be negative) with (ii) the potential room of manipulation constrained by certain governance mechanisms (G) and exploited by the bias effort resultng from the temption to misreport (X). I outlined the model in this post (full sampling code uploaded there).

The simulation code is here: simu_SSM2.2.stan (1.4 KB)

If you have time to take a look, would appreciate very much your feedback on any error discovered or advice on improving the model. Thanks.