General question about debugging non-finite gradients during optimization


I’m trying to figure out why my model somewhat frequently produces non-finite gradients during optimization, that is, pystan outputs “Error evaluating model log probability: Non-finite gradient.” For most inputs the optimization eventually converges to a good solution, but sometimes it fails completely (line search fails to achieve sufficient decrease). I’ve also run into non-finite log probabilities, and those were quite easy to debug using print statements in the model code to check parameter and target() values. I’m trying to make sense of the non-finite gradients using the same approach.

So I have a bunch of print statements in my transformed parameters and model block and get output like this:

some_param [-0.787415,-0.0176929,-0.138075,-0.00557761,1.57621,0.132503,-0.0128881,-1.42206,-  0.989252,3.6584,-0.202712,-1.27918,-3.10609,0.377228,0.182052,0.488984,-1.09212,-0.00378332,-0.783191,-1.08679,-0.377807,-0.193036,-0.00835768]
other_param [1.71227,0.143906,1.56351,0.754314,0.527952,0.256066,1.03616,0.744728,0.176095,0.952388,0.767451,2.4377,0.730618,0.195325,0.354271,0.398495,4.74154,1.85329,1.3436,0.311192,0.361299,0.255751,0.759178,0.256067,0.330403,0.00490886,0.42123,0.151361,0.932506,3.64964,0.173603,2.23444,0.0252843,2.17365,0.604526]
log density at end of model block 11160.4
Error evaluating model log probability: Non-finite gradient.

If we denote by X the point in parameter space whose values are printed just before the “non-finite gradient” error message, does that error message come from evaluating the gradient at X? This is what I assumed at first, but I don’t see why the gradient would blow up at X, and it occurred to me that the error message might actually come from evaluating the gradient at some other point (like maybe X+d obtained by performing a line search from X in some direction). I’m using the default L-BFGS if that makes a difference.

Are there any other ways to get more information about what’s going on under the hood? So far I’ve only used pystan, but I could try rstan as well if it would help with debugging.

(Unfortunately I can’t really share the model or the data, so I’m just asking about debugging models in general.)

Are any of the parameters near the boundaries of constraints?

There’s a grad_log_prob in both Pystan and Rstan for exposing gradients. You could have a look and see if you can get things to explode there.

1 Like

Hi, thanks for responding!

Not so near that it would be obvious to me that it’s causing trouble. For instance, I would think (I might be wrong) that 0.01 is ok for a parameter with a gamma(1,0.3) prior. I do have one parameter with a gamma(1.5,2) prior whose value (on the last iteration before optimization fails) is 0.003. (This is all assuming that the “non-finite gradient” warning really comes from evaluating the gradient at the parameters that get printed by the print statements in the model code, which seems like a reasonable assumption, but I’m not sure if it’s actually true.) Hopefully grad_log_prob will give some insight into this.

It would be useful if I could somehow get stan to return the parameter values at the point where the optimization fails, but I guess I can just try to construct it from the output of the print statements.

Ah, I didn’t know about that, thanks! If I understand the docs correctly, it’s only available on the fit object that stan_model.sampling() returns, but I suppose I can just draw one sample to get my hands on the gradient. :)

That’s probably the best you can do. I think there’s a good chance the prints you see should be the ones where the gradients are failing, but I’m not sure, might just be wishful thinking.