To follow up on what was said during open discussion today about ordering constraints, I have been playing around with the distribution of the product of two normals that have zero means. The density of the product is infinite at zero
http://mathworld.wolfram.com/NormalProductDistribution.html
and so you can scale it out to get something like a spike-and-slab prior, if you are into that.
Assuming the two scale hyperparameters are equal (I set them to 1), there are various ways to parameterize a Stan program like
data {
int<lower=0> N;
real y[N];
real<lower=0> sigma2; // sigma_X * sigma_Y
}
parameters { /* different parameterizations */ }
model {
target += normal_lpdf(y | mu, 1);
// prior according to the parameterization
}
- Make
real mu
a parameter and dotarget += modified_bessel_second_kind(0, fabs(mu) / sigma2) / (pi() * sigma2);
Between the discontinuity of the absolute value function and the infinite gradient at zero, NUTS has some divergences that don’t go away with higheradapt_delta
. - Make
real z[2]
a parameter,mu = z[1] * z[2] * sigma2
a transformed parameter, and dotarget += normal_lpdf(z | 0, sqrt(sigma2));
in the model block. This suffers from label-switching, but you can get the divergences to go away with a high enoughadapt_delta
. And then the Rhats are close to 1. - Like (2) but make
ordered[2] z
a parameter to cut off part of the parameter space. This yields to many more divergences than in (1) regardless ofadapt_delta
, presumably because the modes in (2) are not separated. - Like (3) but with a truncated normal prior on z[2]. This does not help.
So, I think we do not fully understand why (3) fails so badly.