# Ordering constraints

To follow up on what was said during open discussion today about ordering constraints, I have been playing around with the distribution of the product of two normals that have zero means. The density of the product is infinite at zero

http://mathworld.wolfram.com/NormalProductDistribution.html

and so you can scale it out to get something like a spike-and-slab prior, if you are into that.

Assuming the two scale hyperparameters are equal (I set them to 1), there are various ways to parameterize a Stan program like

``````data {
int<lower=0> N;
real y[N];
real<lower=0> sigma2; // sigma_X * sigma_Y
}
parameters { /* different parameterizations */ }
model {
target += normal_lpdf(y | mu, 1);
// prior according to the parameterization
}
``````
1. Make `real mu` a parameter and do `target += modified_bessel_second_kind(0, fabs(mu) / sigma2) / (pi() * sigma2);` Between the discontinuity of the absolute value function and the infinite gradient at zero, NUTS has some divergences that don’t go away with higher `adapt_delta`.
2. Make `real z` a parameter, `mu = z * z * sigma2` a transformed parameter, and do `target += normal_lpdf(z | 0, sqrt(sigma2));` in the model block. This suffers from label-switching, but you can get the divergences to go away with a high enough `adapt_delta`. And then the Rhats are close to 1.
3. Like (2) but make `ordered z` a parameter to cut off part of the parameter space. This yields to many more divergences than in (1) regardless of `adapt_delta`, presumably because the modes in (2) are not separated.
4. Like (3) but with a truncated normal prior on z. This does not help.

So, I think we do not fully understand why (3) fails so badly.

I wouldn’t think label switching would matter at all here
because z and z aren’t themselves of interest. But
it does lead to multiplicative non-identifiability, which
is really nasty due to banana-shaped posteriors.

• Bob