Following nan values in the error message

ahartikainen · September 27, 2020, 7:18am

Quite common problem users have is that some computation results in nan. Would there be a way to increase analysis / verbosity what might cause the nan issue.

E.g. a common case is that some function input contains nan (e.g a slice of larger array), but to see this user needs to do extensive printing. So in this case we could print out the input parameters (+ input slice indexes)

Another case is that lpdf/AD (not sure what is the correct term) evaluates to nan due to numerical issues. Is there a way to identify this?

Would this kind of debugging be even possible with our code?

syclik · October 3, 2020, 5:13am

We should start adding vectorized versions of is_* functions. This way the user can instrument their code. There are a few of the functions exposed, but often it still requires a loop since they are scalar checks.

I’ve been doing that to check computation in programs I write.

jtimonen · October 4, 2020, 3:08pm

I’m sure there is a good explanation for this but have to ask out of curiosity: what is the reason why an error isn’t thrown immediately when NaN is created anywhere during computation?

syclik · October 4, 2020, 8:09pm

@jtimonen, that’s a great question.

Stan’s numeric computation happens in C++. It (usually) follows IEEE floating point rules which has an NaN (see Wikipedia on NaN).

The practical reason is that it takes work to determine whether it’s NaN. This is both in terms of programming work and also computational work.

In the underlying C++ functions in Stan Math library, we often check the input for NaN, but we don’t check the returns for NaN. To change this behavior would be inconsistent with the way C++ implements a lot of math functions (and it would take a lot of effort).

And then there’s the issue of checking whether gradients are NaN. We wouldn’t be able to know that until after the reverse mode sweep happened for all gradients. So that’ll have to be delayed regardless. Fortunately, usually when functions are we’ll behaved, their gradients are also. (This is not always true and causes trouble whenever we discover models that rely on it.)

Edit: added link to NaN on Wikipedia.

Topic		Replies	Views
Debugging overflows in values/gradients Developers	8	823	July 31, 2020
Known gradient breaking behaviours? Developers	8	606	July 30, 2019
Intermediate NaN evaluation tanks the autodiff stack Developers math	2	614	February 17, 2018
Error evaluating the log probability at the initial value Modeling	7	1126	December 10, 2020
When the "sum" function returns nan or -nan General	5	1695	July 20, 2018

Following nan values in the error message

Related topics