Quite common problem users have is that some computation results in nan. Would there be a way to increase analysis / verbosity what might cause the nan issue.
E.g. a common case is that some function input contains nan (e.g a slice of larger array), but to see this user needs to do extensive printing. So in this case we could print out the input parameters (+ input slice indexes)
Another case is that lpdf/AD (not sure what is the correct term) evaluates to nan due to numerical issues. Is there a way to identify this?
Would this kind of debugging be even possible with our code?
We should start adding vectorized versions of is_* functions. This way the user can instrument their code. There are a few of the functions exposed, but often it still requires a loop since they are scalar checks.
I’ve been doing that to check computation in programs I write.
I’m sure there is a good explanation for this but have to ask out of curiosity: what is the reason why an error isn’t thrown immediately when NaN is created anywhere during computation?
Stan’s numeric computation happens in C++. It (usually) follows IEEE floating point rules which has an NaN (see Wikipedia on NaN).
The practical reason is that it takes work to determine whether it’s NaN. This is both in terms of programming work and also computational work.
In the underlying C++ functions in Stan Math library, we often check the input for NaN, but we don’t check the returns for NaN. To change this behavior would be inconsistent with the way C++ implements a lot of math functions (and it would take a lot of effort).
And then there’s the issue of checking whether gradients are NaN. We wouldn’t be able to know that until after the reverse mode sweep happened for all gradients. So that’ll have to be delayed regardless. Fortunately, usually when functions are we’ll behaved, their gradients are also. (This is not always true and causes trouble whenever we discover models that rely on it.)