How to debug "Gradient evaluated at the initial value is not finite" error

Hi, I’m working with an ODE model in RStan 2.26.1. The log_prob function is calculated correctly from what I can tell – it matches what I calculate by hand from the mean-square error of the difference between the ODE solution and given data. However, grad_log_prob says the gradient is (NaN, NaN) (for the two free parameters) for any values I tried. In each case grad_log_prob also reports log_prob, and the log_prob is correct.

How can I dig into what’s going on here to see where the gradient is getting messed up? For example, can I find the code which was generated to compute the gradient? Can I get Stan to print out any intermediate quantities in the gradient calculation? I know I can get Stan to print the gradient to a diagnostic output file, however, what I think I need to look at are the values of things during the calculation, not just the final output (which is NaN for all the examples I tried).

Incidentally I have plotted contours of the RMSE (this should be a simple function of the log likelihood, and therefore the log probability as well, given that the priors are uniform over some range) and from what I can tell the likelihood function is well behaved.

Another message which I found on a similar topic (Gradient evaluated at the initial value is not finite although the lp is finite) didn’t have any specific advice about how to debug this problem. I found some other messages which mentioned similar problems, but, again, didn’t find something applicable to the problem I am encountering.

Thank you for any insights which you may offer,

Robert Dodier

I think sometimes the maximal number of steps is enough for the regular ode solve, but not for the one augmented with the sensitivities.

You can try

  • increasing max_num_steps,
  • decreasing the tolerances or
  • incrementally building up the ODE from a simpler one until you encounter the problem and then troubleshoot what the problem is.

The first option might work, although I don’t actually know whether insufficient max_num_steps would yield the correct log_prob but Nan gradients. The second option is really just a variant of the first, as this will effectively reduce the number of used steps.

Really, you probably need to take the third option. I don’t think you can make Stan print out intermediate gradients/adjoint. And there shouldn’t really be any extra generated code, I believe this all gets handled via templated functions / types.

Hmm, that’s kind of a bummer. I guess I will get started on that.

By the way, I don’t suppose there is any way to convince Stan to do Metropolis-Hastings instead of HMC, is there? If there is none, I was thinking I could bolt on a hand-written implementation in an effort to make some headway. Still trying to sort out where to go from here.