Numerical error tanks stepsize, other possibilities?

sakrejda · March 11, 2017, 12:15am

I have a model (in fact the one I presented in NYC) that relies on the incomplete and complete gamma functions and for some data sets (that look mostly like my simulated data but clearly aren’t) the acceptance rate never quite gets up to 0.8 (or whatever) so the stepsize tanks even though there are no divergences. If I decrease the target acceptance rate to 0.6 I do get divergences but maybe there’s some region in there where things would be ok. Probably not.

My understanding of this is that the culprit here is probably the Gamma function and gradient approximations are not good enough so error accumulates and the accept rate stays too low so the stepsize keeps decreasing. Any other way (besides accumulation of numerical error) that this scenario can happen with our NUTS implementation?

wds15 · March 11, 2017, 6:49am

Have you tried to limit the maximum treedepth? Assuming that NUTS currently reaches some decent depth, the error gets amplified; limiting the treedepth hence will make it less efficient but more accurate. However, you may not be able anymore to get those super-high acceptance rates, I guess.

Almost glad to see that not only ODE stuff is hampered by precision limitations as some more experience would be great in this area when we have limited gradient precision.

Sebastian

bgoodri · March 11, 2017, 4:42pm

For the complete Gamma function, I was thinking the other day that Stan could do better. It is known

https://www.vttoth.com/CMS/projects/41-the-lanczos-approximation

how to get arbitrary precision for the log-Gamma function, which Boost basically already implements (with some additional edge cases). But in the Stan Math Library, we naively call the digamma function (in doubles) to compute the derivative, which can be less accurate. So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.

sakrejda · March 11, 2017, 5:47pm

Thanks for the suggestions, I’m at a conference till Wednesday but I need
this stuff to work so I’ll finish the hypergeometric function pull request
and try out your suggestion (Ben). I think limiting the tree depth will
save me for the simple models but to get the hierarcal stuff working this
needs to be fixed right. Good to hear some confirmation I’m on the right
track.

sakrejda · March 11, 2017, 5:52pm

That’s a great suggestion since I’m currently just doing some test
applications without the hierarcal piece making it less efficient shouldn’t
be a big deal. All the gamma and hypergeometric functions have a tolerance
argument so maybe I’ll try raising that too. It’s currently set at
something like 1e-6 which doesn’t seem like enough. Thanks!

wds15 · March 11, 2017, 6:27pm

1e-6 can be enough, but as you encounter it, it may not be sufficient. The thing is, that one usually just wants enough precision, but not more. At least for ODE models, more accuracy than really needed is a huge waste of computational resources.

I am writing this to caution on a too conservative default setting. It would be best to let users choose these tolerances if possible.

sakrejda · March 11, 2017, 6:46pm

This should really be autotuned in addition to stepsize although I can see
the difficulty unless we have a way of pinpointing which function
contributes most to error. This might be moot because without the in
process bugfix, at least one of the functions can silently return before
reaching the specified tolerance.

Bob_Carpenter · March 12, 2017, 6:24am

I created an issue linked back to this thread: https://github.com/stan-dev/math/issues/505

betanalpha · March 12, 2017, 3:14pm

It will be a long time before Boost’s implementations start becoming the primary issue for the math library. The digamma implementation is not only extremely fast, it is really accurate and in fact tested against a differentiated Lanczos, http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/sf_gamma/digamma.html.

On the other hand, the incomplete Gamma was written by me.

Additionally, having user-specified precisions is a bad idea all the way around. As I have noted many, many times, trying to validate sufficient precision is extraordinary difficult because we don’t have accurate feedback to target. For large errors we do get the behavior of the target accept not increasing past a certain threshold, but for smaller errors that need not be the case and one can still get subtle biases that do not obvious manifest in sampler behavior. Even for “experts” all we can do is play with heuristics and shrug our shoulders that maybe it is good enough.

sakrejda · March 12, 2017, 3:31pm

betanalpha http://discourse.mc-stan.org/users/betanalpha Developer
March 12

It will be a long time before Boost’s implementations start becoming the
primary issue for the math library. The digamma implementation is not only
extremely fast, it is really accurate and in fact tested against a
differentiated Lanczos,
http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/sf_gamma/digamma.html.

On the other hand, the incomplete Gamma was written by me.

Is that an argument we should trust it more or less than Boost?

Additionally, having user-specified precisions is a bad idea all the way
around. As I have noted many, many times, trying to validate sufficient
precision is extraordinary difficult because we don’t have accurate
feedback to target. For large errors we do get the behavior of the target
accept not increasing past a certain threshold, but for smaller errors that
need not be the case and one can still get subtle biases that do not
obvious manifest in sampler behavior. Even for “experts” all we can do is
play with heuristics and shrug our shoulders that maybe it is good enough.

+1

betanalpha · March 12, 2017, 3:48pm

Less. Much less.

sakrejda · March 12, 2017, 3:57pm

:)

On the up side I think this model is exercising most of the functions and
gradients we need for survival models to work smoothly and unless I’m
mistaken this has been affecting a lot of other models too. I saw some
weird stuff with the Weibull-uniform mixture and generalized gamma models I
used for this data previously and even though I got it working at scale it
was fragile. At the time I didn’t have the experience to pursue it into
the weeds like this.

The fact that stuff like this is fixable with a week or two of work is
probably my #1 reason to fanboy over Stan all the time. It’s also why I
really believe in making the math library well documented down to
algorithms and sources s.t. the barrier to entry in fixing this stuff is
smaller. … although maybe I’m wrong that anybody else who’s a
non-specialist would be willing to dig in like this?

charlesm93 · March 13, 2017, 1:38pm

It’s also why I
really believe in making the math library well documented down to
algorithms and sources s.t. the barrier to entry in fixing this stuff is
smaller

+1

Bob_Carpenter · March 14, 2017, 4:13am

It’d be great if people want to start contributing C++ doc. Doxygen makes it very easy. And we take pull requests.

We’ve been trying to insist on more C++ doc with new pull requests, but it’s hard to catch up with all the older stuff—there’s so much of it.

Or did you mean at a higher level, like the Boost doc? That’s much much harder to maintain and keep in synch with the actual code.

sakrejda · March 14, 2017, 11:11am

Please not higher level stuff! We can barely keep models in the manual
consistent! :)

I was thinking of my experience with the hypergeometric functions so far
where the biggest time suck for fixing the bugs was figuring out which
hypergeometric function was meant. I had to work from the approximation in
the code back to some definitions I looked up. That was only possible
because our implementation is a naive one that matches the definition. It
was little hard because the math people write “The hypergeometric function"
and of course they really mean the specific one from the family they’re
working on. Once I figured one out the rest was easy since they’re all
related but ffs Daniel’s last message on those was along the lines of
”[either there’s a bug or I don’t understand which function this is]".
Turns out there was a bug. We just shouldn’t do that to ourselves.
Especially since we can’t afford to wait for a specialist to come along and
fix this stuff.

I’m not sure we’ll get “people” to do it anytime soon so we’re probably
stuck with updating as we do bugfixes and refactoring. I just saw some
noises about how internal namespaces would make it easier to avoid
documenting stuff and I completely disagree with it. Sure there are some
functions too obvious to document so it’s a judgment call but a blanket
"it’s internal so we don’t doc" is shooting ourselves in the foot. The c++
language is huge and you only need to know a few facets to contribute to
any given piece of Stan but if we’re not rigorous about doc all of a sudden
you need to know much more and you need to be able to figure out which of
the million of c++ techniques for accomplishing a given task is implemented
in the function you’re looking at. So all that is to say please, except
for the really simple functions, let’s keep writing the doc. Something that
differentiates us from JAGS/WinBUGS is that we ARE getting outside
contributors despite our more complex codebase. The payoff for that is
going to outlast even the use of HMC in Stan.

betanalpha · March 14, 2017, 2:10pm

Having doc is great, but let’s not pretend that we can write a massive mathematics treatise. Both hypergeometric functions are completely identified by their names, in particular their indices. The variable naming also followed standard notation (including NIST and Wikipedia). The proposed doc starts pushing hard against our style guidelines of “don’t doc obvious stuff” and I would argue doesn’t even resolve any of the complaints. The new doc just says “power series, there are convergence conditions” but doesn’t say which power series or link to references for those convergence conditions. How does that make it any easier for someone to come in and make changes later?

Again, I’m not against doc. But it should be more useful, not just more verbose, like

The generalized hypergeometric function, $\[_{3}F_{2}\]$.  

This implementation uses a power series expansion around z = 0 with radius of convergence z = 1.  

There are poles for a or b equalling an integer.  

For more details see http://dlmf.nist.gov/15.2#i.

(Note that this doc isn’t accurate, I just threw some stuff together as a hypothetical example).

sakrejda · March 14, 2017, 2:50pm

We don’t need to write a mathematical treatise. I think the middle road is
assuming a naive reader who has the background to solve the problems in a
given function but is not a specialist and is not familiar with the
terminology. The hypergeometric functions are a good example because they
have weird notation and articles full of irrelevant results but all you
need to understand our calculation is 1) notation; 2) infinite series; 3)
convergence conditions; 4) bare minimum of c++. A note on the terminology
and a good reference would be plenty.

Yet we had (have) a bug in the loop conditions in one of them that will
return inaccurate values silently under some conditions (many iterations to
convergence) and at least one that hangs.

That said, I don’t think we can solve this with generalities. I’ll give it
a shot on the branch for hypergeometric functions and write some comments
separately on where I think the line is… as soon I can get some reliable
[redacted] internet. Honestly the internet around the DC Hilton is worse
than an Amtrak train going through a tunnel.

charlesm93 · March 14, 2017, 3:02pm

+1 for Doxygen docs in C++
I like the doc we have, between the wiki and the autodiff paper, but it’s not easy to navigate. I plan to update the “Contributing New Stan Functions” wiki page, and reference the AD paper (and maybe a C++ tutorial too) more systematically. That way, we have a place with an overview and pointers to more detailed docs.

Bob_Carpenter · March 15, 2017, 6:07am

The problem’s always where to draw the line with doc. I think what Michael’s laying out here is the right level. We’ve been very bad previously about not including any pointers.

When I say I don’t want to doc internal functions, I’m talking about simple ones we don’t expect users of the math libs to use directly. If there’s some hairy math algorithm, then by all means a pointer to a paper where the algorithm’s defined would be great. And a precise definition of the function being computed, if possible, is also good.

Bob_Carpenter · March 15, 2017, 6:11am

I’m afraid we are the specialists at this point! By “people”, I mean contributors, developers, etc. Nobody’s going to object to a pull request that adds doc, so if you think it needs it, feel free to add it where you think it’s necessary.

I don’t want to start documenting C++ techniques inside of our doc. I think the function doc at the doxygen level should say what the function does, ideally with a mathematically precise definition of the intent. The arguments and constraints on them should be documented, as should exceptions being thrown. None of that says anything about whether we’re using ADL, SFINAE, RAII, CRTP, or any of the other gazillion C++ patterns with cryptic acronyms. We just have to assume the readers of the code know C++.

Topic		Replies	Views
Acceptance ratio in NUTS Developers	4	911	June 28, 2021
NUTS misses U-turns, runs in circles until max_treedepth Algorithms	66	5495	August 31, 2019
Blog: Taming Divergences in Stan Models Publicity	8	1198	February 19, 2018
Incomplete gamma function with negative argument Algorithms	10	3059	February 10, 2019
D_gamma_p_da gradient error Developers	2	562	July 28, 2017

Numerical error tanks stepsize, other possibilities?

Related topics