Ordering constraints

To follow up on what was said during open discussion today about ordering constraints, I have been playing around with the distribution of the product of two normals that have zero means. The density of the product is infinite at zero


and so you can scale it out to get something like a spike-and-slab prior, if you are into that.

Assuming the two scale hyperparameters are equal (I set them to 1), there are various ways to parameterize a Stan program like

data {
  int<lower=0> N;
  real y[N];
  real<lower=0> sigma2; // sigma_X * sigma_Y
parameters { /* different parameterizations */ }
model {
  target += normal_lpdf(y | mu, 1);
  // prior according to the parameterization
  1. Make real mu a parameter and do target += modified_bessel_second_kind(0, fabs(mu) / sigma2) / (pi() * sigma2); Between the discontinuity of the absolute value function and the infinite gradient at zero, NUTS has some divergences that don’t go away with higher adapt_delta.
  2. Make real z[2] a parameter, mu = z[1] * z[2] * sigma2 a transformed parameter, and do target += normal_lpdf(z | 0, sqrt(sigma2)); in the model block. This suffers from label-switching, but you can get the divergences to go away with a high enough adapt_delta. And then the Rhats are close to 1.
  3. Like (2) but make ordered[2] z a parameter to cut off part of the parameter space. This yields to many more divergences than in (1) regardless of adapt_delta, presumably because the modes in (2) are not separated.
  4. Like (3) but with a truncated normal prior on z[2]. This does not help.

So, I think we do not fully understand why (3) fails so badly.

I wouldn’t think label switching would matter at all here
because z[1] and z[2] aren’t themselves of interest. But
it does lead to multiplicative non-identifiability, which
is really nasty due to banana-shaped posteriors.

  • Bob