What is the supported external API of the Math Library?

I like your direction here @sakrejda . That is - the Stan project is in existence to solve Bayesian problems - easy ones, hard ones - any Bayesian problem. Anything it takes to get there is what we care about. We obviously want that Stan is a stable thing. We want Stan to only get faster over time and we want it to always give us the correct posterior (so the set of correct posteriors must grow over time).

Now there have been made efforts in the past to have a Stan-math library tailored for autodiff - which is why there is an arxiv publication. However, this is a side-product (at least to me).

Having a stable API is obviously helping us to get to our goal of an awesome Stan. However, where I do object somewhat is that Stan-math clients are our main objective here. We should be kind to them, yes, of course. We have created the side-product ourselves, but if we now make things more formal we must align it with our over-arching project goal - and that is “Stan” as a tool to crunch Bayesian problems.

If having the side-product Stan-math autodiff library leads to a lot of contributions from people who use it for a autodiff library only… then that’s value for the Stan project, but I haven’t seen this happening.

Maybe this helps a bit… don’t know.

I have been seeing the opposite advice given recently from more than one core Stan dev - to just change everything to include the relevant top-level include rather than Include What You Use. We should figure out what the procedure is. @syclik is this on a wiki somewhere?

+1 would love further test automation and think that’s a good direction.

This is already in place, and some of the automation around it is being improved now. The focal point for this right now is the CmdStan Performance Tests, where we will soon have a benchmark and numerical gold tests run against CmdStan develop every time it changes (which means every time Stan and Math change as well) (right now it’s just running daily). We’ll also have benchmark results and numerical gold tests running vs. develop running on PRs soon.

I don’t see why we have to explicitly decide that right now - it would help clarify things a lot, and Bob was moving that direction when he was technical director, but I actually think the Math library can be a useful thing even if we simplify it. Would you mind writing up what we lose if we switch to supporting only rev/mat.hpp and mix/mat.hpp? Here’s my quick list:

  • The ability to use the Math library without using Eigen (probably doesn’t work; already decided we don’t care about it w/ the flattening)
  • We introduce some chance of breakage for people who are not using autodiff at all (which I assume we don’t care about as we’re primarily an autodiff library)
  • Same thing for those who are using JUST forward mode and not mix - but we have always told people that forward mode isn’t tested and to use it at one’s own risk. So this is just a formalization of this pre-existing policy.

There is probably more I’m forgetting, but let’s talk about the actual decision here w.r.t. moving testing & support to the two headers and try to analyze the pros and cons by group of users.

I didn’t say decide, I want you to understand the perspective. The main user of the math library who needs to think of it in terms other than autodiff is the math Dev looking to improve it or to fix a bug. We need testing for prim independent of all the ‘rev’ machinery because some of the most complex gradients are implemented as separate functions in prim and nobody wants to troubleshoot that mess on top of autodiff. There are few of us who can do it in the first place and the time:reward ratio for implementing fixes to the numerical code is pretty dismal already.

Aside: the tests themselves already pull in scale/arr/mat, I was thinking of the main code; the fwd lib is tested, but we’ve been pretty clear about where it stands

I 100% agree that we need to focus primarily on maintainability - that’s actually what I’m trying to address with this proposal about consolidated testing! So it sounds like there are a few functions in prim that should really be tested on their own without invoking autodiff in any way, is that right? If that’s the crux of the disagreement here I think we’re in luck: it should be pretty easy to determine if switching to a rev/mat and mix/mat test split makes that more difficult. I would guess it would actually be fine - just invoke those troublesome prim functions only on doubles, and I would suspect that the autodiff mechanics never get involved, right?

That’s a bit sketchy! What’s a “math-related function”, or I guess more importantly, what isn’t? Is that just saying everything in stan::math is part of our API?

But there’s no doc for how a lot of these functions are supposed to work. Is the idea that however they work now is the definition? So no bug fixes without changing function names and supporting old functions that were never documented? This has specifically caused problems with things like exception and other boundary condition behavior.

Am I right in understanding that the location of headers is not supported? So we can do the flattening without breaking the API?

Am I also correct in understanding that only reverse-mode is part of the API (only the jacobian and gradient functions were called out, and presumably only the versions implemented with reverse mode—there are also implementations of both with forward mode, I think).

Is that true? I thought we were encouraging people to use our top-level includes, because that’s the only API-level things we support. This is related to @seantalts’s comment:

Moving on,

I think that’s just reverse-mode functions documented in the Stan functions guide. There may be a few odds and ends like assignments, but those should arguably be moved to stan::model.

+1 to that. I think we should keep our goal of supporting Stan in mind.

I don’t think we’d lose anything in terms of real use. We need rev for first-order, and mix for higher-order.

That’s how all the auto-testing is set up—assume the prim version works with the right throw behavior.

Yes, most of them in fact. But that doesn’t have to determine where they go in terms of the file system. I think it may be easier in the end to also have a prim version in addition to rev and mix.

1 Like

This bit is key. From the perspective of improving math lib code, It would actually be helpful to pull more of the numerical code from rev out and test it as doubles-only. The concern is not few functions or a side product. We need to improve our ability to test numerical code and lumping it with the bookkeeping in rev is going in the wrong direction.

That was my thinking.

Maybe we should refine it to “all functions in stan::math will be supported to the API their documentation describes.” (or something worded less awkwardly; slightly hungover)

Yep, so we’re saying forward mode is not yet supported and its API isn’t fixed.

Exactly my thoughts as well. @sakrejda I’m still not really understanding the point you’re making here - would you mind operationalizing it a little bit for me? I deal much better with concrete examples. Would you mind giving an example of a function that we would like to test with doubles but that could not use the rev/mat.hpp include to do those tests, or highlight what we’d be losing if we moved to that? I think a simple example would help me understand the general issue here.

We have explicit specializations in rev so you’re potentially not even calling the prim code in the test. We can’t just replace the prim code with the specialized code because it carries out extra calculations that you don’t need and it’s more complex so more prone to errors.

For an example try this and this

I was thinking in the other direction that what we want to expose is the high-level functionals (which deal with their own nesting automatically) and the functions we expose in the Stan language.

Everything else feels like support to me. The issue for backward compatibility is when users do things like write custom derivative vari using our built-in low-level “internal” tools. What I’m not sure about is how much is supported. Is the exact constructor used in log1p_vari important, for example? It’s only there so we can define log1p(const var&).

This does bring up the issue I’ve run into in the past about how to make sure the right version of a function’s getting called. This is a problem for our functionals, too, as most of them have reverse-mode implementations which are not themselves differentiable and forward-mode implementations which are fully differentiable, but much less efficient.

The main reason we want to make sure primitive versions get called is for efficiency—we don’t want calls with doubles to affect the autodiff stack. There are hooks to test that we could include.

We also need to check this the other way around—that rev mode is called when there are var arguments. We don’t want to just instantiate the prim-level template functions. I’m less sure on how we could test that automatically.

If we mix prim and rev together, we can start using enable-if and disable-if to sort this out programatically so that the wrong instantiations won’t even compile in most cases.

We’ve been very inconsistent and have favored overloads to template function specializations.

For an example try this and this

I see the templated prim implementation and specialized rev implementation, but if those were in the same file, I don’t see why it’d make testing any harder.

Sean has been talking about having a single header used to pull in different files. If they were in a single file they’d always be pulled in together so at least you couldn’t claim ignorance but idk how that lets you guarantee that you test the prim version regardless of the input argument types.

That’s def. worth addressing!

This is independent of the testing. We want to make sure the right thing gets instantiated in Stan programs, and Stan’s going to be bringing in the whole library.

It’s easy to make sure you test the prim version because implicit promotion to var will affect the autodiff stack in a measurable way. The other way around is harder because instantiating a templated prim version will also instantiate var on the stack as will a specialized var version.

Easiest way to make sure you test the prim version is to not include the rev version.

That will indeed test the prim version. But it won’t test whether the prim version gets called when someone includes our library header, which is what we really need to test. It doesn’t matter if the prim version works if it doesn’t get called when it should.

1 Like

@seantalts did the examples I shared help you understand the point that I am bringing up?

It is different than the point @Bob_Carpenter brought up and as somebody who has in the past successfully tracked down narsty long-standing bugs in the math and rstan libraries I would rather not have my point get lost under Bob’s completely orthogonal issue.

I think I see the issue now - we don’t have double specializations, so you worry about making sure the default specializations get called for doubles when we also pull in rev, which can provide its own specializations. Is that right? I agree that it feels more hygienic to only include the prim files in order to be doubly sure you’re testing those default specializations. Just to give handy names, I think this issue is essentially about code coverage - making sure that all the code that exists has tests written that execute it.

If that’s the correct issue, let’s think about how we could mitigate it and still improve maintenance/cognitive dev burdens (& test times). My perspective on the Math library is that, at the end of the day, you want to make sure LDLT_factor works with double and var matrices under the supported headers. I think this is what you’re saying is Bob’s orthogonal issue, I’m not sure what to call it, maybe functional or API correctness. To me, code coverage is trumped by testing functionality, and the way to look at code coverage is to install a code coverage tool - this handles lots of edge cases way better than our current system. Would installing such a plugin satisfy your code coverage concern here? Are there other cheaper, easier ways to assuage that concern?

To be frank, I don’t know what the point of testing functions in prim are if you can’t invoke them from rev given that we don’t support the prim/mat.hpp header. Keeping tests in prim does nothing to illuminate when that might be the case, either, and it seems like even if you did end up in that pathological state you would prefer to know that double invocations are correct from rev rather than that dead code in prim is correct.

Missed Bob’s answer to this one:

It should be relatively easy to define a test fixture we can use to test all prim functions from rev that checks that nothing was put on the autodiff stack.

Whoa… this got complicated real quick! That’s great! I’m glad we’re having this discussion.

@seantalts, when I said the decision to limit the Math API hasn’t been made, I really don’t think it has. Ideally, we’d start a different thread that’s titled something appropriate like “Limiting the Math API to reverse mode” or something like that so it’s clear. This thread is now 38 responses deep and counting. It’s not clear what the decision is and who has made it. So… for now, the Math library’s API is what we’ve supported so far. We can change that! Just not in a way that obfuscates the decision.

There are a lot of good reasons for splitting the Math library into prim, rev, fwd, and mix. At this point, I don’t think there are good reasons for splitting each of these into scal, arr, and mat. (I remember we split into the scal, arr, mat distinction based on user requests… back when it was hard to compile Eigen and Boost together, but we’re way past that with what’s easy to install with compilers. We also didn’t understand the maintenance burden of that decision at the time.)

@sakrejda and @Bob_Carpenter are talking about problems that we’ve actually faced in the past. Before we had these 12 headers, we actually did have a lot of trouble compiling. We would have template metaprogramming issues that would lead to the wrong specialization being called (unintuitively) or things not compiling due to having a trait instantiated in the wrong order. This was really hard on new devs understanding how to develop for the Math library. Since splitting into the 12 headers, we haven’t had many of these problems.

@seantalts, thanks for coming at this from a different angle. One of the things you’re suggesting is something that I hadn’t considered for a long time. In the last couple posts, you’ve suggested only running with rev and mix. (If that’s what you meant from the very start, that wasn’t clear at all; it really sounded like all you wanted was reverse mode.) That’s an interesting proposal that we’ll need to evaluate, but it can be evaluated.

Before we split this into 12 headers, we actually did have 4 headers. We split it this way because the different toolchains we were supporting could not handle the compilation. If I recall correctly, we couldn’t even include the mix header and only instantiate reverse mode. That might have been a mistake on our part or it might have been compiler limitations… We could be well past that with the upgraded C++ compilers everywhere. If we are, then I think what we could consider are either what you’re suggesting or I’d even take it a step further and say we can unify it all. It’d be much simpler to have the generic template definition along with all the specializations in the same file. In theory, we should be able to include everything as long as we don’t instantiate more than what we need.

My guess, though, is that the sweet spot is splitting this into prim, rev, fwd, and mix. But once again, we could evaluate this. (I don’t know if we can actually get everything included in the right order and not instantiated until we want to; maybe it’s possible?)

You’re right, we accidentally hijacked this thread and its length and continued activity is confusing. I’m marking the solution and closing the thread - we can continue having the testing discussion on another thread if you’re interested.