What is the supported external API of the Math Library?


#1

Hey all,

We got into a conversation about what our wiki on versioning and compatibility means w.r.t. the Stan Math library. We brought it up at the meeting yesterday, but @syclik wasn’t around and so I wanted to summarize the consensus from that meeting online and perhaps engage Daniel and the others from the meeting here if there’s more to be worked out.

We all agree that we’re currently promising backwards compatibility (via semantic versioning) to users of the Stan language (see the Stan wiki, above). I think we are, to some extent, trying to provide something similar with the Math library, but it’s much less clear what the user API is in this case because it’s just a bunch of source code and makefiles. I will try to write out what I think we mostly agree we should support, though we didn’t talk about all of this in the meeting as we were focused on the specific issue at hand.

  • Reverse mode automatic differentiation only (fwd and mix still need work and tests)
  • Single-threaded only, except through map_rect[0]
  • Use only through one of the top-level headers (e.g. stan/math/rev/mat.hpp or stan/math/prim/mat.hpp, and not any of the lower ones like arr.hpp or scal.hpp)
  • Nothing in any of the meta directories, which have historically changed constantly
  • Using only our Makefiles
  • Almost entirely to make the Stan language awesome. If we can easily provide for non-Stan users of the Math library, then we shall, but it should always be suborned to the Stan language needs.

As a practical test case for this policy, please consider this PR. Once that is merged, anyone starting a thread on which they want to perform AD operations will need to call an init() function. This will not affect single-threaded users using our Makefiles, nor will affect those using map_rect. @syclik reads the versioning wiki I think pretty reasonably to say that we’d need to bump the major version number for this, because “A change in a library breaks backward compatibility if a program that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.” Because a program could have been written in the past 6 months based on 2.18.0 that used the Math library with threads, and now that program would need to call the init() function on its threads.

However, I would propose we change the wording in that wiki to reflect our actual policy. We have made numerous backwards-incompatible changes by that incredibly high standard, and we have issues out for bugfixes now that will break previously working programs that relied on bugs, for example. I propose the following wording:

A change in a library breaks backward compatibility if a program using only supported APIs that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.

I don’t think we should be tied to supporting all source code we have ever released.

I’m curious how people feel about this, so here’s a poll.

  • Keep the existing wording

  • Change the wording

0 voters

[0] For the two 2.18 releases, you could have, for the first time, used the Math library with C++ threads (with or without interacting with map_rect). We did not advertise that you could do this without map_rect and it seems likely that no one has been using this since it is about 6 months old and we don’t have many math library users anyway.

cc @wds15 @Bob_Carpenter @mitzimorris @breckbaldwin @bgoodri @Matthijs


#2

Just to highlight the proposed change, the additional text being proposed for the wiki is in {}.

A change in a library breaks backward compatibility if a program using {only supported APIs} that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.

Breck


#3

Thanks for putting up the post! Btw, I just responded to the poll.

At the heart of it, it’s the source. The makefiles are just a convenience to get it built; we don’t actually need that as part of it.

Re: bulleted points. I agree with most of what you’ve said. I’ll address the things that I think we’re not exactly eye-to-eye on (they’re not major).

Before getting to specifics, here’s what I think is the API:

  1. The Math Library headers (akin to C++ Standard Library headers):
    • stan/math.hpp [1]
  2. All objects, functions, etc.
    What I mean by that is stan::math::var shouldn’t be renamed until a major change. That means that functions that we introduce have to stay because they’re part of the API. To remove, we have to deprecate and wait for the major release. For a good example, see all the *_log() functions that just call the *_lpdf() functions.
  3. How we call (reverse-mode) automatic differentiation.
    To summarize, at the start of a program, the autodiff stack is ready to go and a client can just start creating stan::math::var. After the client is done building up an expression, the client can either call .grad(...) on a var object, can call the grad(...) function (it’s in the autodiff paper), or use the gradient(...) functional (also in the autodiff paper). And then to reset the autodiff stack, the user must call stan::math::recover_memory().

One minor point: I only consider tagged versions. We might put something in and then rip it out but if it happens between any tagged releases, no harm no foul.

[1] Yes, that’s right… just that one header. In the generated C++ code in Stan, it includes stan/model/model_header.hpp from Stan which only include stan/math.hpp and even our arXiv paper only includes that header. We split the headers as a way for us to partially include things.)

Now… there was is wiki page called Quick Start in the Math wiki that outlines all the top-level includes we have. It’s a judgement call, but I’d be ok if we really said our only actual header that we guaranteed was stan/math.hpp and everything else was out of convenience.

For history, we started splitting things out after the autodiff paper because we included forward mode. That was actually all in the stan/math.hpp header, but then we got to a point where we couldn’t compile on Windows and was forced to split things apart.

Ok… back to commenting on your points (which, once again, I agree with most of them).

In spirit, I agree with you here, but stan/math/rev/mat.hpp and stan/math/rev/arr.hpp are at the same level.

At this point, I think it makes sense to move towards stan/math/rev.hpp, fwd.hpp, etc. and drop the arr, mat, scal distinction.

AFAIK, the functions haven’t changed. Nothing’s disappeared. Files might have moved as they’ve generalized, but I don’t agree that things have changed and have changed constantly.

I think in the meta directories, what we have are actually at least two classes of functions. Some are client-facing like size(). But most are actually not, like operands_and_partials. We really don’t expect that to be used by any client directly. This is making me think we should probably make a distinction between what functions are part of the API that a client should depend on and which are not.

In my opinion, I think we’re still at a design / prototype phase and we should treat it as such. We need to do some more thinking about this before we decide to end up with something that locks us into something that becomes part of the API.

The problem with the proposed design is that it leaves initializing the autodiff stack different for threading as opposed to no-threading, MPI, and GPU. That doesn’t sit right to me. Either we should move towards having an initialization phase for the autodiff stack (for everything) or we shouldn’t. We shouldn’t expect the client to have to remember that if it’s threaded, then do it this way, if it’s not, do it that way. That also sets a bad precendence to have things inconsistent across new major features, which is really a burden to maintainers and clients both.

I spoke to @wds15 on Monday about this and I told him exactly that. If there’s no other way to maintain speed, then I’m actually ok with having things be inconsistent, but I’d like to know for certain that we can’t find some other way to automatically initialize. Our conversation around this point was that neither of us are experts so there indeed might be a way to do what we’re looking for and we just haven’t found it yet.

So… as the PR stands, if merged, I’d say that’s a major version bump.

Any examples? I tried going back through a bunch of the recent PRs and couldn’t find one. There was a recent one that had locscale_transform renamed, but I think that all happened post v2.18.0. (I’m just talking Math, not any of the other repos.)

+100.

But… we should be kind to clients.

Yes, agreed!

This is where we run into issues, though. I’m not exactly sure what’s enforced by RStan as a requirement on Math in practice. This init() for threading example… how will expose_stan_functions() work? Do certain functions get exposed and are included directly? Does this happen from within PyStan also?

Let’s say it does get through… CmdStan’s pretty easy to keep things in synch. Since we’re using submodules, each CmdStan hash is tied to a particular hash of Stan and Math. As long as everything is consistent, we’re good. But RStan ends up not being in synch. Maybe it’s a non-issue, maybe it’s a bigger issue. (It’d help to discuss this within the TWG to get a clearer understanding of what’s an implied API.)

Anyway, yes. Agreed.


#4

That all sounds good pretty much, just a few minor things, plus I think we have more to talk about on the PR but I think that conversation should continue on the PR thread. I definitely like saying that we should keep just stan/math.hpp as the supported entrypoint and not rename var, any math functions, or change how you get gradients until a version bump.

I think meta should only contain non-client facing stuff, so maybe we should move size out? I think meta used to be short for “C++ Template Metaprograms,” but now it has such a wide array of stuff, not all of which necessarily uses metaprogramming. I don’t think our users need us to provide a generic interface to get a collection’s size; applications generally stick to one collection type per use-case. Our value-add is in the math functions and their gradients, not in whatever tools we created to make programming those up easier. Maybe there are some other meta programs I’m not thinking of that users would want to use, though?

And we have changed meta functions in the past (maybe not “constantly,” poor word choice on my part); there was a huge refactor of operands_and_partials that no one thought twice about doing it (and no one reported being affected). This is also an example of a change that technically broke our all-possible-programs-will-be-backwards-compatible guarantee.

Re PyStan and RStan keeping up - recently we’ve been doing a lot of worrying about breaking those interfaces, but the feedback from @bgoodri (and I think @ariddell) has generally been that StanC++ should do what is best for StanC++, which will generally then be what is best for RStan and PyStan in the long run, even though there may be some hassles involved in adapting. An example is the new OCaml Stan compiler. This breaks the existing build system for RStan, but @bgoodri has said that he will be able to figure out something when the time comes and there’s something to ship (there are a bunch of plans and contingency plans, I’m just not going into it here for this example).

I think this is right. And yeah, we shouldn’t do egregious stuff, but generally speaking we need to reduce coupling by reducing the supported API so we can operate independently. This will eventually mean RStan no longer including specific header files from the Stan compiler, and that’s a good thing.

I think the poll has been up long enough so I’m going to change the wording now and we can work out what the API page should say.


#5

Awesome. We’re agreed on everything. I’ll continue the discussion on the PR thread.

I think that makes sense.

I think I remember people on the old forums or this one talking about certain things they do with the C++ library that involved the meta programs. (Or it might have been at a StanCon.) Anyway, I wouldn’t be surprised if others actually use some of it. That said, I’m ok if we separated what we considered internal from external and moved on. (We might classify everything as internal and I’d be ok with that.)

Wait… I remember one use case is @Andria_Dawson’s code to compute gradients for a super-large model that used GPUs. I’m pretty sure her code dove right into the guts of the meta programs.

That’s good to hear!

+1.


#6

I edited the wiki page with the extra 4 words and added a link to this new page, where I tried to summarize our discussion.


#7

I was just reading this SemVer article on HN and the first thing it mentions is from the introduction to the 2007 version of Semantic Versioning:

I propose a simple set of rules and requirements that dictate how version numbers are assigned and incremented. For this system to work, you first need to declare a public API. This may consist of documentation or be enforced by the code itself. Regardless, it is important that this API be clear and precise.

Well, we’re getting there with this thread! 😅 I think the Stan language is a little bit more clearly defined, though it could probably use more precision as well.


#8

should the wiki page link to the online doxygen docs, i.e. the github pages:
http://mc-stan.org/math/
cheers,
Mitzi


#9

We don’t have the code (and thus the doxygen) organized in such a way that distinguishes what we’re supporting and what we aren’t (other than the fairly new internal namespace), so it might be more confusing than helpful… And it might suggest we reorg the code, but I actually think they’re just innately two separate documents as we’ll always be developing things that we expect to one day release (and thus will live in a public namespace) but that won’t be supported until they’re complete.