Hi,
I’m trying to figure out why my model somewhat frequently produces non-finite gradients during optimization, that is, pystan outputs “Error evaluating model log probability: Non-finite gradient.” For most inputs the optimization eventually converges to a good solution, but sometimes it fails completely (line search fails to achieve sufficient decrease). I’ve also run into non-finite log probabilities, and those were quite easy to debug using print statements in the model code to check parameter and target() values. I’m trying to make sense of the non-finite gradients using the same approach.
So I have a bunch of print statements in my transformed parameters and model block and get output like this:
some_param [-0.787415,-0.0176929,-0.138075,-0.00557761,1.57621,0.132503,-0.0128881,-1.42206,- 0.989252,3.6584,-0.202712,-1.27918,-3.10609,0.377228,0.182052,0.488984,-1.09212,-0.00378332,-0.783191,-1.08679,-0.377807,-0.193036,-0.00835768]
other_param [1.71227,0.143906,1.56351,0.754314,0.527952,0.256066,1.03616,0.744728,0.176095,0.952388,0.767451,2.4377,0.730618,0.195325,0.354271,0.398495,4.74154,1.85329,1.3436,0.311192,0.361299,0.255751,0.759178,0.256067,0.330403,0.00490886,0.42123,0.151361,0.932506,3.64964,0.173603,2.23444,0.0252843,2.17365,0.604526]
log density at end of model block 11160.4
Error evaluating model log probability: Non-finite gradient.
If we denote by X the point in parameter space whose values are printed just before the “non-finite gradient” error message, does that error message come from evaluating the gradient at X? This is what I assumed at first, but I don’t see why the gradient would blow up at X, and it occurred to me that the error message might actually come from evaluating the gradient at some other point (like maybe X+d obtained by performing a line search from X in some direction). I’m using the default L-BFGS if that makes a difference.
Are there any other ways to get more information about what’s going on under the hood? So far I’ve only used pystan, but I could try rstan as well if it would help with debugging.
(Unfortunately I can’t really share the model or the data, so I’m just asking about debugging models in general.)