Coding invalid parameters

I have a model for which parameters outside a region D ⊂ ℝⁿ are invalid. D is not characterized in closed form, and no amount of change of variable acrobatics can transform it to ℝⁿ (actually, I can find some A ⊃ D where A is not much larger than D, but that merely mitigates the problem). But I can detect if the parameters are outside D, the question is what to assign to the likelihood there.

Fortunately, there is no mass at the edges of D, and the likelihood goes to -∞ at the edges quickly. I looked at the Stan sources and found something like

if (boost::math::isnan(h))
  h = std::numeric_limits<double>::infinity();

in multiple places, eg here for NUTS. So is (stylized code)

if (parameter is invalid)
  target += NaN

the right thing to do? Or should I use -Inf? The advantage of NaN is that for some invalid calculations, that’s what I get out of the box.

Some experimentation with toy models suggests that it works as long as I pick the starting point inside D. I guess it would help if I control the initial stepsize (before adaptation) but I am not sure how to do that (using rstan).

I think both nan and negative_infinity have the same result: that leapfrogging will stop and it will choose from among its previous leapfrog steps with a divergence marking. There is a stepsize parameter that can be passed via stan or sampling, but it doesn’t do much and possibly is being ignored.

1 Like

Within Stan, the thing to do is use reject(msg). That’ll not only reject, but it’ll give you a message as to why.

The problem you may run into is that hard boundaries like this defeat the ability of the Hamiltonian dynamics to complete, so you may not be able to recover the parameters or you may be able to only do so slowly because Stan has devolved to a random walk to deal with the boundaries. If the density goes to zero as you approach the boundaries, this may or may not be a problem. Increasing adapt_delta to keep step size low as @bgoodri recommends is a good place to start. You can use stepsize for initialization, but adapt_delta being near 1 is where to go for adapted low step sizes.

1 Like