Ordering constraints

To follow up on what was said during open discussion today about ordering constraints, I have been playing around with the distribution of the product of two normals that have zero means. The density of the product is infinite at zero

http://mathworld.wolfram.com/NormalProductDistribution.html

and so you can scale it out to get something like a spike-and-slab prior, if you are into that.

Assuming the two scale hyperparameters are equal (I set them to 1), there are various ways to parameterize a Stan program like

data {
  int<lower=0> N;
  real y[N];
  real<lower=0> sigma2; // sigma_X * sigma_Y
}
parameters { /* different parameterizations */ }
model {
  target += normal_lpdf(y | mu, 1);
  // prior according to the parameterization
}
  1. Make real mu a parameter and do target += modified_bessel_second_kind(0, fabs(mu) / sigma2) / (pi() * sigma2); Between the discontinuity of the absolute value function and the infinite gradient at zero, NUTS has some divergences that don’t go away with higher adapt_delta.
  2. Make real z[2] a parameter, mu = z[1] * z[2] * sigma2 a transformed parameter, and do target += normal_lpdf(z | 0, sqrt(sigma2)); in the model block. This suffers from label-switching, but you can get the divergences to go away with a high enough adapt_delta. And then the Rhats are close to 1.
  3. Like (2) but make ordered[2] z a parameter to cut off part of the parameter space. This yields to many more divergences than in (1) regardless of adapt_delta, presumably because the modes in (2) are not separated.
  4. Like (3) but with a truncated normal prior on z[2]. This does not help.

So, I think we do not fully understand why (3) fails so badly.

I wouldn’t think label switching would matter at all here
because z[1] and z[2] aren’t themselves of interest. But
it does lead to multiplicative non-identifiability, which
is really nasty due to banana-shaped posteriors.

  • Bob