Throw exception when <multiplier = 0>?

  • The old way of doing non-centering and this are different. There’s a Jacobian adjustment thing going on here. You could just as well do:
parameters {
  real<lower=0> sigma_y;
  real<lower=0> sigma_a;
  vector<multiplier=sigma_y>[J] a;
}
model {
  y ~ normal(a[group], sigma_y);
  a ~ normal(0, sigma_a);
}

Or:

parameters {
  real<lower=0> sigma_y;
  real<lower=0> sigma_a;
  vector<multiplier=sigma_a>[J] a;
}
model {
  y ~ normal(a[group], sigma_y);
  a ~ normal(0, sigma_a);
}

But only the second thing is non-centering. The first is still fine statistically – I think it just probably wouldn’t work super well lol.

  • I’m in on reporting error messages aggressively, but has anyone fixed a problem because of this error? I still use the old non-centering thing so I don’t have experience with this. Setting this to actual zero should fail to initialize.

  • Maybe we should both print errors and provide summaries of them. The bad part about printing errors is that maybe an unimportant one prints 100 times and then an important one prints once. The good part is printing errors gives you more info to find problems.

Maybe something like:

> fit
47 warnings in 4000 iterations, check fit$warnings() for more info
variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
    lp__  -7.25  -6.98 0.69 0.31 -8.70 -6.75 1.00     1818     2210
    theta  0.25   0.24 0.12 0.12  0.08  0.46 1.00     1447     1563
> fit$warnings()
23 normal_id_glm sigma == 0
17 max number of iterations exceeded
7 log of inf

That helped me find the bug, which is on line 27 of (math)/stan/math/prim/prob/lognormal_lpdf.hpp, which is:

check_nonnegative(function, "Random variable", y);

It should be check_positive because 0 is not in the support of a lognormal. Here’s the issue:

If we get rid of the multiplier error (I still don’t know what the consensus is there), then both will throw when they hit the lognormal prior, and neither would throw with a half normal prior (which is consistent with zero). This whole thing’s a mess when the math says we should be in (0, 1) but the floating point underflows to 0 or rounds to 1.

The solution I suspect @andrewgelman is going to want is to just suppress all warnings during warmup.

I’m not sure where that first model came from, but I agree it’s not something we’d want to do.

For the second one, the Jacobian adjustment is

target += J * log(sigma_a);

which should just make things uniform on a > 0 as a result, so that the result’s the same as doing all this manually (though a bit cleaner and a bit less efficient).

I see three things going on.

  1. Maybe it makes sense to have a “quiet” setting that switches off warnings during warmup. I’m not sure about this, though, as sometimes a warmup warning can be useful. We could consider turning off warnings just in the first stage of warmup. As usual, this is a naive user vs. super user tradeoff. For the naive user, early warnings can be distracting and can lead to a boy-who-cried-wolf syndrome. For the super user, warnings at any stage can give us information, and more information should be better.

  2. This particular set of warnings still seems unnecessary to me, because sigma is coming in the target function in two places, neither of which I think should cause a problem. First, there’s a log(sigma) in the target, but this should be fine, as Stan is actually working internally with log(sigma), which never itself blows up (it’s just at -300 or -1000 or whatever). Second, there’s a term of -0.5*((y - mu)/sigma)^2, but this could just become -Inf (thus, always rejecting the jump) when log(sigma) is less than the threshold.

  3. You (Bob) were saying that the reason this is going on is that the gradient causes the HMC algorithm to overshoot and send log(sigma) down to -1000. If so, another way to go would just be to not let parameters on the log scale go below -300 or whatever. For programming reasons this might be a nonstarter, but this would solve this particular problem.

My point is that in some settings the warnings during warmup are telling us real information about the geometry of the problem or whatever. But in this case the warnings are not telling us about any such problem. We can see this because they occur even when I put a strong prior on sigma_a which knocks out that ugly tip of the funnel.

So this does not seem to me to be a legit warning. I’m bothered by this warning; this doesn’t mean that I think all warnings should be shut off.

That’s what I was suggesting—kill the messenger.

I get what you mean, but I don’t like the analogy. This is not a false positive. The system’s crying “underflow” because there’s really underflow.

It absolutely does cause a problem in that a zero multiplier cuts the gradient to the parameters. The only reason the sampler gets over the problem is that it’ll reject and retry with a new random momentum.

That’s only true if you stick to the unconstrained scale in the Stan program. As soon as this shows up in the parameters block

real<lower = 0> sigma;

then the unconstrained form of sigma (i.e., log(sigma)) is exponentiated to produce sigma, which is then the variable available in the rest of the program.

The problem with standard deviation going to zero in a normal distribution is that there are log(sigma) and (y - mu) / sigma terms involved.

The problem with the multiplier going to zero here

vector<multiplier = sigma_a>[N] a;

is that all of the a go to zero and we degenerate from a vector of different variables to a vector of zeros.

How do you propose doing that? We can’t just say

if (x < -300) x = -300;

as we get discontinuities and we can’t autodiff through a constant like that.

Oh, I didn’t realize. I thought that when we had log(sigma) in the Stan program, it worked with the internal log_sigma variable (i.e., the parameter on the unconstrained scale).

You have to be more specific. If we write

parameters {
  real<lower = 0> sigma;

then the underlying parameter over which Stan samples is log(sigma). But, every iteration, the constrained value sigma is calculated using exp(). Then sigma is the only variable exposed. To get back to the unconstrained variable, you’d have to do log(sigma), but it’s already too late.

Now if you want to do this yoursekf, it’s easy:

parameters {
  real log_sigma;
transformed parameters {
  real<lower = 0>  sigma = exp(log_sigma);

If you want to put a distribution on sigma, you also need the Jacobian adjustment

model {
  target += fabs(inv(sigma));  // Jacobian adjustment

But if you just wnat to put a prior on log_sigma, such as a normal to imply sigma is lognormal, that you can do. But you still can’t put the resulting sigma into a normal distribution if it underflows to zero.

Intersting. So in theory Stan could be made efficient by keeping log(sigma) and using it when necessary, but I’m sure that this would come at much too great of a cost in complexity of the language, compared to whatever tiny amount of computation time it would save.

For now, I can just accept the warning, and in our documentation we can just make clear that sometimes Stan gives warnings in very early iterations which do not represent real problems.