I have a model (in fact the one I presented in NYC) that relies on the incomplete and complete gamma functions and for some data sets (that look mostly like my simulated data but clearly aren’t) the acceptance rate never quite gets up to 0.8 (or whatever) so the stepsize tanks even though there are no divergences. If I decrease the target acceptance rate to 0.6 I do get divergences but maybe there’s some region in there where things would be ok. Probably not.

My understanding of this is that the culprit here is probably the Gamma function and gradient approximations are not good enough so error accumulates and the accept rate stays too low so the stepsize keeps decreasing. Any other way (besides accumulation of numerical error) that this scenario can happen with our NUTS implementation?

Have you tried to limit the maximum treedepth? Assuming that NUTS currently reaches some decent depth, the error gets amplified; limiting the treedepth hence will make it less efficient but more accurate. However, you may not be able anymore to get those super-high acceptance rates, I guess.

Almost glad to see that not only ODE stuff is hampered by precision limitations as some more experience would be great in this area when we have limited gradient precision.

how to get arbitrary precision for the log-Gamma function, which Boost basically already implements (with some additional edge cases). But in the Stan Math Library, we naively call the digamma function (in doubles) to compute the derivative, which can be less accurate. So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.

Thanks for the suggestions, I’m at a conference till Wednesday but I need
this stuff to work so I’ll finish the hypergeometric function pull request
and try out your suggestion (Ben). I think limiting the tree depth will
save me for the simple models but to get the hierarcal stuff working this
needs to be fixed right. Good to hear some confirmation I’m on the right
track.

That’s a great suggestion since I’m currently just doing some test
applications without the hierarcal piece making it less efficient shouldn’t
be a big deal. All the gamma and hypergeometric functions have a tolerance
argument so maybe I’ll try raising that too. It’s currently set at
something like 1e-6 which doesn’t seem like enough. Thanks!

1e-6 can be enough, but as you encounter it, it may not be sufficient. The thing is, that one usually just wants enough precision, but not more. At least for ODE models, more accuracy than really needed is a huge waste of computational resources.

I am writing this to caution on a too conservative default setting. It would be best to let users choose these tolerances if possible.

This should really be autotuned in addition to stepsize although I can see
the difficulty unless we have a way of pinpointing which function
contributes most to error. This might be moot because without the in
process bugfix, at least one of the functions can silently return before
reaching the specified tolerance.

On the other hand, the incomplete Gamma was written by me.

Additionally, having user-specified precisions is a bad idea all the way around. As I have noted many, many times, trying to validate sufficient precision is extraordinary difficult because we don’t have accurate feedback to target. For large errors we do get the behavior of the target accept not increasing past a certain threshold, but for smaller errors that need not be the case and one can still get subtle biases that do not obvious manifest in sampler behavior. Even for “experts” all we can do is play with heuristics and shrug our shoulders that maybe it is good enough.

On the other hand, the incomplete Gamma was written by me.

Is that an argument we should trust it more or less than Boost?

Additionally, having user-specified precisions is a bad idea all the way
around. As I have noted many, many times, trying to validate sufficient
precision is extraordinary difficult because we don’t have accurate
feedback to target. For large errors we do get the behavior of the target
accept not increasing past a certain threshold, but for smaller errors that
need not be the case and one can still get subtle biases that do not
obvious manifest in sampler behavior. Even for “experts” all we can do is
play with heuristics and shrug our shoulders that maybe it is good enough.

On the up side I think this model is exercising most of the functions and
gradients we need for survival models to work smoothly and unless I’m
mistaken this has been affecting a lot of other models too. I saw some
weird stuff with the Weibull-uniform mixture and generalized gamma models I
used for this data previously and even though I got it working at scale it
was fragile. At the time I didn’t have the experience to pursue it into
the weeds like this.

The fact that stuff like this is fixable with a week or two of work is
probably my #1 reason to fanboy over Stan all the time. It’s also why I
really believe in making the math library well documented down to
algorithms and sources s.t. the barrier to entry in fixing this stuff is
smaller. … although maybe I’m wrong that anybody else who’s a
non-specialist would be willing to dig in like this?

It’s also why I
really believe in making the math library well documented down to
algorithms and sources s.t. the barrier to entry in fixing this stuff is
smaller

Please not higher level stuff! We can barely keep models in the manual
consistent! :)

I was thinking of my experience with the hypergeometric functions so far
where the biggest time suck for fixing the bugs was figuring out which
hypergeometric function was meant. I had to work from the approximation in
the code back to some definitions I looked up. That was only possible
because our implementation is a naive one that matches the definition. It
was little hard because the math people write “The hypergeometric function"
and of course they really mean the specific one from the family they’re
working on. Once I figured one out the rest was easy since they’re all
related but ffs Daniel’s last message on those was along the lines of
”[either there’s a bug or I don’t understand which function this is]".
Turns out there was a bug. We just shouldn’t do that to ourselves.
Especially since we can’t afford to wait for a specialist to come along and
fix this stuff.

I’m not sure we’ll get “people” to do it anytime soon so we’re probably
stuck with updating as we do bugfixes and refactoring. I just saw some
noises about how internal namespaces would make it easier to avoid
documenting stuff and I completely disagree with it. Sure there are some
functions too obvious to document so it’s a judgment call but a blanket
"it’s internal so we don’t doc" is shooting ourselves in the foot. The c++
language is huge and you only need to know a few facets to contribute to
any given piece of Stan but if we’re not rigorous about doc all of a sudden
you need to know much more and you need to be able to figure out which of
the million of c++ techniques for accomplishing a given task is implemented
in the function you’re looking at. So all that is to say please, except
for the really simple functions, let’s keep writing the doc. Something that
differentiates us from JAGS/WinBUGS is that we ARE getting outside
contributors despite our more complex codebase. The payoff for that is
going to outlast even the use of HMC in Stan.

Having doc is great, but let’s not pretend that we can write a massive mathematics treatise. Both hypergeometric functions are completely identified by their names, in particular their indices. The variable naming also followed standard notation (including NIST and Wikipedia). The proposed doc starts pushing hard against our style guidelines of “don’t doc obvious stuff” and I would argue doesn’t even resolve any of the complaints. The new doc just says “power series, there are convergence conditions” but doesn’t say which power series or link to references for those convergence conditions. How does that make it any easier for someone to come in and make changes later?

Again, I’m not against doc. But it should be more useful, not just more verbose, like

The generalized hypergeometric function, $\[_{3}F_{2}\]$.
This implementation uses a power series expansion around z = 0 with radius of convergence z = 1.
There are poles for a or b equalling an integer.
For more details see http://dlmf.nist.gov/15.2#i.

(Note that this doc isn’t accurate, I just threw some stuff together as a hypothetical example).

We don’t need to write a mathematical treatise. I think the middle road is
assuming a naive reader who has the background to solve the problems in a
given function but is not a specialist and is not familiar with the
terminology. The hypergeometric functions are a good example because they
have weird notation and articles full of irrelevant results but all you
need to understand our calculation is 1) notation; 2) infinite series; 3)
convergence conditions; 4) bare minimum of c++. A note on the terminology
and a good reference would be plenty.

Yet we had (have) a bug in the loop conditions in one of them that will
return inaccurate values silently under some conditions (many iterations to
convergence) and at least one that hangs.

That said, I don’t think we can solve this with generalities. I’ll give it
a shot on the branch for hypergeometric functions and write some comments
separately on where I think the line is… as soon I can get some reliable
[redacted] internet. Honestly the internet around the DC Hilton is worse
than an Amtrak train going through a tunnel.

+1 for Doxygen docs in C++
I like the doc we have, between the wiki and the autodiff paper, but it’s not easy to navigate. I plan to update the “Contributing New Stan Functions” wiki page, and reference the AD paper (and maybe a C++ tutorial too) more systematically. That way, we have a place with an overview and pointers to more detailed docs.

The problem’s always where to draw the line with doc. I think what Michael’s laying out here is the right level. We’ve been very bad previously about not including any pointers.

When I say I don’t want to doc internal functions, I’m talking about simple ones we don’t expect users of the math libs to use directly. If there’s some hairy math algorithm, then by all means a pointer to a paper where the algorithm’s defined would be great. And a precise definition of the function being computed, if possible, is also good.

I’m afraid we are the specialists at this point! By “people”, I mean contributors, developers, etc. Nobody’s going to object to a pull request that adds doc, so if you think it needs it, feel free to add it where you think it’s necessary.

I don’t want to start documenting C++ techniques inside of our doc. I think the function doc at the doxygen level should say what the function does, ideally with a mathematically precise definition of the intent. The arguments and constraints on them should be documented, as should exceptions being thrown. None of that says anything about whether we’re using ADL, SFINAE, RAII, CRTP, or any of the other gazillion C++ patterns with cryptic acronyms. We just have to assume the readers of the code know C++.