To follow up on what was said during open discussion today about ordering constraints, I have been playing around with the distribution of the product of two normals that have zero means. The density of the product is infinite at zero

http://mathworld.wolfram.com/NormalProductDistribution.html

and so you can scale it out to get something like a spike-and-slab prior, if you are into that.

Assuming the two scale hyperparameters are equal (I set them to 1), there are various ways to parameterize a Stan program like

```
data {
int<lower=0> N;
real y[N];
real<lower=0> sigma2; // sigma_X * sigma_Y
}
parameters { /* different parameterizations */ }
model {
target += normal_lpdf(y | mu, 1);
// prior according to the parameterization
}
```

- Make
`real mu`

a parameter and do`target += modified_bessel_second_kind(0, fabs(mu) / sigma2) / (pi() * sigma2);`

Between the discontinuity of the absolute value function and the infinite gradient at zero, NUTS has some divergences that don’t go away with higher`adapt_delta`

. - Make
`real z[2]`

a parameter,`mu = z[1] * z[2] * sigma2`

a transformed parameter, and do`target += normal_lpdf(z | 0, sqrt(sigma2));`

in the model block. This suffers from label-switching, but you can get the divergences to go away with a high enough`adapt_delta`

. And then the Rhats are close to 1. - Like (2) but make
`ordered[2] z`

a parameter to cut off part of the parameter space. This yields to many more divergences than in (1) regardless of`adapt_delta`

, presumably because the modes in (2) are not separated. - Like (3) but with a truncated normal prior on z[2]. This does not help.

So, I think we do not fully understand why (3) fails so badly.