What is the supported external API of the Math Library?

Hey all,

We got into a conversation about what our wiki on versioning and compatibility means w.r.t. the Stan Math library. We brought it up at the meeting yesterday, but @syclik wasn’t around and so I wanted to summarize the consensus from that meeting online and perhaps engage Daniel and the others from the meeting here if there’s more to be worked out.

We all agree that we’re currently promising backwards compatibility (via semantic versioning) to users of the Stan language (see the Stan wiki, above). I think we are, to some extent, trying to provide something similar with the Math library, but it’s much less clear what the user API is in this case because it’s just a bunch of source code and makefiles. I will try to write out what I think we mostly agree we should support, though we didn’t talk about all of this in the meeting as we were focused on the specific issue at hand.

  • Reverse mode automatic differentiation only (fwd and mix still need work and tests)
  • Single-threaded only, except through map_rect[0]
  • Use only through one of the top-level headers (e.g. stan/math/rev/mat.hpp or stan/math/prim/mat.hpp, and not any of the lower ones like arr.hpp or scal.hpp)
  • Nothing in any of the meta directories, which have historically changed constantly
  • Using only our Makefiles
  • Almost entirely to make the Stan language awesome. If we can easily provide for non-Stan users of the Math library, then we shall, but it should always be suborned to the Stan language needs.

As a practical test case for this policy, please consider this PR. Once that is merged, anyone starting a thread on which they want to perform AD operations will need to call an init() function. This will not affect single-threaded users using our Makefiles, nor will affect those using map_rect. @syclik reads the versioning wiki I think pretty reasonably to say that we’d need to bump the major version number for this, because “A change in a library breaks backward compatibility if a program that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.” Because a program could have been written in the past 6 months based on 2.18.0 that used the Math library with threads, and now that program would need to call the init() function on its threads.

However, I would propose we change the wording in that wiki to reflect our actual policy. We have made numerous backwards-incompatible changes by that incredibly high standard, and we have issues out for bugfixes now that will break previously working programs that relied on bugs, for example. I propose the following wording:

A change in a library breaks backward compatibility if a program using only supported APIs that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.

I don’t think we should be tied to supporting all source code we have ever released.

I’m curious how people feel about this, so here’s a poll.

  • Keep the existing wording

  • Change the wording

0 voters

[0] For the two 2.18 releases, you could have, for the first time, used the Math library with C++ threads (with or without interacting with map_rect). We did not advertise that you could do this without map_rect and it seems likely that no one has been using this since it is about 6 months old and we don’t have many math library users anyway.

cc @wds15 @Bob_Carpenter @mitzimorris @breckbaldwin @bgoodri @Matthijs

Just to highlight the proposed change, the additional text being proposed for the wiki is in {}.

A change in a library breaks backward compatibility if a program using {only supported APIs} that worked in the previous version no longer works the same way in the current version. For backward-compatibility breaking changes, the major version number is incremented.

Breck

Thanks for putting up the post! Btw, I just responded to the poll.

At the heart of it, it’s the source. The makefiles are just a convenience to get it built; we don’t actually need that as part of it.

Re: bulleted points. I agree with most of what you’ve said. I’ll address the things that I think we’re not exactly eye-to-eye on (they’re not major).

Before getting to specifics, here’s what I think is the API:

  1. The Math Library headers (akin to C++ Standard Library headers):
    • stan/math.hpp [1]
  2. All objects, functions, etc.
    What I mean by that is stan::math::var shouldn’t be renamed until a major change. That means that functions that we introduce have to stay because they’re part of the API. To remove, we have to deprecate and wait for the major release. For a good example, see all the *_log() functions that just call the *_lpdf() functions.
  3. How we call (reverse-mode) automatic differentiation.
    To summarize, at the start of a program, the autodiff stack is ready to go and a client can just start creating stan::math::var. After the client is done building up an expression, the client can either call .grad(...) on a var object, can call the grad(...) function (it’s in the autodiff paper), or use the gradient(...) functional (also in the autodiff paper). And then to reset the autodiff stack, the user must call stan::math::recover_memory().

One minor point: I only consider tagged versions. We might put something in and then rip it out but if it happens between any tagged releases, no harm no foul.

[1] Yes, that’s right… just that one header. In the generated C++ code in Stan, it includes stan/model/model_header.hpp from Stan which only include stan/math.hpp and even our arXiv paper only includes that header. We split the headers as a way for us to partially include things.)

Now… there was is wiki page called Quick Start in the Math wiki that outlines all the top-level includes we have. It’s a judgement call, but I’d be ok if we really said our only actual header that we guaranteed was stan/math.hpp and everything else was out of convenience.

For history, we started splitting things out after the autodiff paper because we included forward mode. That was actually all in the stan/math.hpp header, but then we got to a point where we couldn’t compile on Windows and was forced to split things apart.

Ok… back to commenting on your points (which, once again, I agree with most of them).

In spirit, I agree with you here, but stan/math/rev/mat.hpp and stan/math/rev/arr.hpp are at the same level.

At this point, I think it makes sense to move towards stan/math/rev.hpp, fwd.hpp, etc. and drop the arr, mat, scal distinction.

AFAIK, the functions haven’t changed. Nothing’s disappeared. Files might have moved as they’ve generalized, but I don’t agree that things have changed and have changed constantly.

I think in the meta directories, what we have are actually at least two classes of functions. Some are client-facing like size(). But most are actually not, like operands_and_partials. We really don’t expect that to be used by any client directly. This is making me think we should probably make a distinction between what functions are part of the API that a client should depend on and which are not.

In my opinion, I think we’re still at a design / prototype phase and we should treat it as such. We need to do some more thinking about this before we decide to end up with something that locks us into something that becomes part of the API.

The problem with the proposed design is that it leaves initializing the autodiff stack different for threading as opposed to no-threading, MPI, and GPU. That doesn’t sit right to me. Either we should move towards having an initialization phase for the autodiff stack (for everything) or we shouldn’t. We shouldn’t expect the client to have to remember that if it’s threaded, then do it this way, if it’s not, do it that way. That also sets a bad precendence to have things inconsistent across new major features, which is really a burden to maintainers and clients both.

I spoke to @wds15 on Monday about this and I told him exactly that. If there’s no other way to maintain speed, then I’m actually ok with having things be inconsistent, but I’d like to know for certain that we can’t find some other way to automatically initialize. Our conversation around this point was that neither of us are experts so there indeed might be a way to do what we’re looking for and we just haven’t found it yet.

So… as the PR stands, if merged, I’d say that’s a major version bump.

Any examples? I tried going back through a bunch of the recent PRs and couldn’t find one. There was a recent one that had locscale_transform renamed, but I think that all happened post v2.18.0. (I’m just talking Math, not any of the other repos.)

+100.

But… we should be kind to clients.

Yes, agreed!

This is where we run into issues, though. I’m not exactly sure what’s enforced by RStan as a requirement on Math in practice. This init() for threading example… how will expose_stan_functions() work? Do certain functions get exposed and are included directly? Does this happen from within PyStan also?

Let’s say it does get through… CmdStan’s pretty easy to keep things in synch. Since we’re using submodules, each CmdStan hash is tied to a particular hash of Stan and Math. As long as everything is consistent, we’re good. But RStan ends up not being in synch. Maybe it’s a non-issue, maybe it’s a bigger issue. (It’d help to discuss this within the TWG to get a clearer understanding of what’s an implied API.)

Anyway, yes. Agreed.

That all sounds good pretty much, just a few minor things, plus I think we have more to talk about on the PR but I think that conversation should continue on the PR thread. I definitely like saying that we should keep just stan/math.hpp as the supported entrypoint and not rename var, any math functions, or change how you get gradients until a version bump.

I think meta should only contain non-client facing stuff, so maybe we should move size out? I think meta used to be short for “C++ Template Metaprograms,” but now it has such a wide array of stuff, not all of which necessarily uses metaprogramming. I don’t think our users need us to provide a generic interface to get a collection’s size; applications generally stick to one collection type per use-case. Our value-add is in the math functions and their gradients, not in whatever tools we created to make programming those up easier. Maybe there are some other meta programs I’m not thinking of that users would want to use, though?

And we have changed meta functions in the past (maybe not “constantly,” poor word choice on my part); there was a huge refactor of operands_and_partials that no one thought twice about doing it (and no one reported being affected). This is also an example of a change that technically broke our all-possible-programs-will-be-backwards-compatible guarantee.

Re PyStan and RStan keeping up - recently we’ve been doing a lot of worrying about breaking those interfaces, but the feedback from @bgoodri (and I think @ariddell) has generally been that StanC++ should do what is best for StanC++, which will generally then be what is best for RStan and PyStan in the long run, even though there may be some hassles involved in adapting. An example is the new OCaml Stan compiler. This breaks the existing build system for RStan, but @bgoodri has said that he will be able to figure out something when the time comes and there’s something to ship (there are a bunch of plans and contingency plans, I’m just not going into it here for this example).

I think this is right. And yeah, we shouldn’t do egregious stuff, but generally speaking we need to reduce coupling by reducing the supported API so we can operate independently. This will eventually mean RStan no longer including specific header files from the Stan compiler, and that’s a good thing.

I think the poll has been up long enough so I’m going to change the wording now and we can work out what the API page should say.

1 Like

Awesome. We’re agreed on everything. I’ll continue the discussion on the PR thread.

I think that makes sense.

I think I remember people on the old forums or this one talking about certain things they do with the C++ library that involved the meta programs. (Or it might have been at a StanCon.) Anyway, I wouldn’t be surprised if others actually use some of it. That said, I’m ok if we separated what we considered internal from external and moved on. (We might classify everything as internal and I’d be ok with that.)

Wait… I remember one use case is @Andria_Dawson’s code to compute gradients for a super-large model that used GPUs. I’m pretty sure her code dove right into the guts of the meta programs.

That’s good to hear!

+1.

I edited the wiki page with the extra 4 words and added a link to this new page, where I tried to summarize our discussion.

1 Like

I was just reading this SemVer article on HN and the first thing it mentions is from the introduction to the 2007 version of Semantic Versioning:

I propose a simple set of rules and requirements that dictate how version numbers are assigned and incremented. For this system to work, you first need to declare a public API. This may consist of documentation or be enforced by the code itself. Regardless, it is important that this API be clear and precise.

Well, we’re getting there with this thread! 😅 I think the Stan language is a little bit more clearly defined, though it could probably use more precision as well.

should the wiki page link to the online doxygen docs, i.e. the github pages:
http://mc-stan.org/math/
cheers,
Mitzi

We don’t have the code (and thus the doxygen) organized in such a way that distinguishes what we’re supporting and what we aren’t (other than the fairly new internal namespace), so it might be more confusing than helpful… And it might suggest we reorg the code, but I actually think they’re just innately two separate documents as we’ll always be developing things that we expect to one day release (and thus will live in a public namespace) but that won’t be supported until they’re complete.

@syclik and I got into another discussion of this and some related issues in https://github.com/stan-dev/math/pull/1186. I’ll try to summarize and move the discussion here:

I want to distinguish between three general Math library issues that came up:

  1. What modes / headers / directories require tests for a PR to be accepted?
  2. Where should those tests optimally live?
  3. Did we decide on an external Math API?

For #1I don’t have strong opinions and this may live on a wiki somewhere, apologies if so. For #2 hypothetically assuming we want mixed mode to be officially supported one day, I think we can still ask for tests for forward and/or mixed mode even if it’s not in the API, of course. My advice would be to ask for tests at one or two API levels:

  1. rev/mat.hpp, and optionally

  2. mix/mat.hpp

…and no others. That’s because we expect our use cases to be at the level of one of those two headers. Stan is the obvious example of header 1, and RHMC Stan is the obvious for header 2. supporting JUST mixed mode should allow folks to use just forward mode if they want, it will just have longer compile times or they can try using just the forward mode headers knowing it’s not officially tested. That would be my $0.02.

This has only limited relevance to #1 and #2, but for #3 I think we DID decide on the externally supported API above in this discourse thread. This is a case where I want to do my #1 obligation as TD and step in to end a potentially endless debate and just pick something we can work with, with the knowledge that it may be suboptimal and is up for re-examination later. So if that was unclear please let me know and I’ll add some more verbiage to the thread. Again, I don’t think that decision really impacts #1 and #2 that much, I just want it to be clear that it was made and we are semantically versioning against that API and all that. This external API just impacts how the projects work together.

1 Like

The link looks broken from here, which discussion?

Here: https://github.com/stan-dev/math/pull/1186

1 Like

Idk about a wiki but after the early days of the math library the rule has been that if it’s worth writing then it’s worth testing.

The math library is different than the interface code or the language parser /generator because for most complex functions the only evidence we have that they work is when they produce accurate results (even in the parser you can at least look at the generated code for hints) compared to reference values for a wide range of inputs. We’ve had poorly tested functions that created biased results, we’ve had poorly tested functions that broke models in subtle ways, we’ve had poorly tested functions that broke releases, we’ve had at least one distribution that caused ongoing maintenance headaches because the person who wrote it moved on and nobody really understood the design choices. The point in listing these categories is that in the math library it is easy to write something that looks right but returns garbage for some range of values. The same is true of the algorithms, but other parts of the code base, even if they’re complex, do not share this feature and should not be used to decide how testing is handled for math.

We can reduce the testing burden without sacrificing correctness through automated instantiation, separating reference values from test code, and automating tests of invariant relationships among functions.

For the API question I don’t have strong opinions but if we loosen the testing requirements we’re just going to get more subtle bugs that nobody knows how to track down. Just to give you a sense of the scale here, some math lib bugs took multiple people and months to track down (part time months but still) The Weiner distribution is another good example where it got implemented, partially tested, looked ok, and then sucked down weeks of troubleshooting time.

2 Likes

“rule” in the sense that it’s always been the answer I’ve run into when contributing.

Hmm, I agree with everything you’re saying and especially “if it’s worth writing, it’s worth testing.” I wasn’t thinking about what I was proposing as a reduction in the quantity or quality of tests. I was proposing we keep the same tests but reorganize them by moving the tests to be in a file at one of two API levels. So it would be a reduction in the number of test files but not tests; currently the tests are spread out in all the various directories (prim/rev/fwd/mix X scal/arr/mat), but under the proposal we’d just take all those tests and collate them all into a bigger file that tests at either the rev/mat.hpp or mix/mat.hpp level. So the only reduction in testing strength is that we wouldn’t be testing that a math library user who only includes prim/scal.hpp will get the correct behavior, and to me that’s fine because that’s not the supported external API of the math library (rev/mat.hpp is) nor is it intended to be supported one day (the way mix/mat.hpp is).

I think changing the testing philosophy this way would actually improve coverage of what we care about (more on that below), make it much easier for new developers to grok the code, and reduce testing resource consumption and time dramatically.

I can think of one case where there would be a reduction in the quantity of tests under the proposal, which is if you had a test at the prim/scal level such as:

TEST(MetaTraits, is_fvar) {
  using stan::is_fvar;
  using stan::math::fvar;
  EXPECT_TRUE(is_fvar<fvar<int>>::value);
  EXPECT_TRUE(is_fvar<fvar<double>>::value);
  EXPECT_TRUE(is_fvar<fvar<fvar<double>>>::value);
  EXPECT_TRUE(is_fvar<fvar<fvar<fvar<double>>>>::value);
}`

and then you had the same test repeated in rev/mat (alongside the rev-specific tests):

TEST(MetaTraits, is_fvar) {
  using stan::is_fvar;
  using stan::math::fvar;
  EXPECT_TRUE(is_fvar<fvar<int>>::value);
  EXPECT_TRUE(is_fvar<fvar<double>>::value);
  EXPECT_TRUE(is_fvar<fvar<fvar<double>>>::value);
  EXPECT_TRUE(is_fvar<fvar<fvar<fvar<double>>>>::value);
}

 TEST(MetaTraits, is_fvar) {
  using stan::is_fvar;
  EXPECT_FALSE(is_fvar<stan::math::var>::value);
}

In this case, we’d delete the original file with no replacements and the number of tests would go down. But as far as I can tell we haven’t been writing code this way - instead we’ve been writing JUST reverse mode oriented tests in rev such that we typically actually aren’t testing the primitive operations still work under the rev headers. So I think the proposal is an upgrade on our current stance because we move the tests to testing the API header we want to support (and that is used in Stan).

I reread the thread; no decision has been made. (if we’re making it a decision, it should be clear what we’re deciding… no one should have to read a long thread to figure it out.) We are definitely discussing it.

I knew the wiki that you threw up bothered me and I think I can finally articulate why. What you’re describing is the API that Stan (that language) uses. It doesn’t describe the API that the Math library provides.

There are users of the Math library other than Stan. We have, since v2.0, provided a stable API for mathematical functions (without derivatives), reverse mode autodiff, forward mode (partially tested), and mixed mode (also partially tested). There are users that rely on this functionality, even if not all the developers care about the additional functionality. Additionally, we now have developers trying to actively work on higher order autodiff.

So… what do we need to make a decision? And how do we make that decision?

I think we need to understand the impact of narrowing the scope of the math library:

  • how does it affect users? For Stan, it doesn’t affect them now. If we’re able to implement higher order autodiff, we’ll be able to run RHMC without rewriting models. What about other users that use the math library without Stan? We should hear from them and their use cases before we decide. (We’ll need to do some user interviews)
  • how does this affect developers? There are pros and cons to limiting the scope of the math library. Pro: reduces cognitive burden and testing time. Con: some devs are interested in extending the math library and it is potentially something that could be very impactful.

I think we should discuss and get feedback before declaring that the API is much narrower in scope than what has been around for years. (and especially before we alter the codebase so it doesn’t work for people.)

How about we go about deciding? I think we should gather input, discuss amongst the active Math devs and get their feedback. I’d suggest the Math lead (me with that hat on) takes that input into consideration and helps make a decision that’s agreeable to most of the devs. This should be taken up to the TWG to ratify, record, and back if necessary. This way, the decision is clear, who made it is clear, who contributed to the decision is clear, and how we got there is clear.

What do you think?

What do you consider the current API?

1 Like

I wrote this, but I’ll get into a little more detail:

Different Math devs have written different versions over the years (I’m listing what I can find quickly down below), but what I consider to be the current API is written here: Quick Start · stan-dev/math Wiki · GitHub.

To break it down a bit:

  • 13 header includes. C++ clients can expect these to exist.
    • stan/math.hpp
    • these 12 header files: stan/math/{prim, rev, fwd, mix}/{scal, arr, mat}.hpp (but we’re trying to knock that down to 4 with the “flattening” refactor)
  • Math functions (almost everything in the stan::math namespace):
    • every function in err, fun, functor, prob, and vectorize folders, e.g. stan/math/prim/scal/fun/*.hpp.
    • a lot of the meta folder, but not all; some of this is considered “internal,” but clients should have access to templated functions like value_of(...) or is_var

That covers almost everything in the Math library. I can’t think of a single function signature I’d be willing to change at will, but I’m sure there are a handful of functions that we could. In practice, we’ve acted as though this is our API because we’ve deprecated functions like normal_log() when we could easily have removed it and had Stan’s generator handle the name change.

So, that’s what I consider to be the C++ API. I know there are users (that have posted on Discourse) that have stated they use the Math library for higher order autodiff without the Stan language. I don’t know if the number of clients that use it for that purpose are in the single digits or if it’s much higher.


Different places where Math devs and users have discussed it:

1 Like

We had a discussion in this thread where we formalized the API and I summarized our discussion & the new API on a wiki. So as of Feb 1 we have had an official external Math API. Of course users are free to use whatever parts they feel like, it’s just a question of what we test automatically and what we require tested in PRs.

I think it’s a good thing we have this as the API because right now there’s a ton of work being put into the flattening project by @anon79882417 and @roualdes that would remove a bunch of what you are saying you previously considered the API to be, which would require bumping the Math library to 3.0 if that had been the official API we were semantically versioning against.

This is a bit of a red herring. Function signatures exist at the rev/mat level as well, and are part of the official external API of the Math library. No one is proposing we change normal_log or any other function signatures.

What I’m trying to get at is why the math lib should not be tested from the point of view of the Stan language.

The math lib tests are (supposed to be) written using minimal headers to avoid taking on library-wide problems when adding a single function. The automated testing work is a better route to improving test coverage without bogging down every single feature addition in whatever technical debt there might be. That comes out in integration tests. I can see the appeal of more integrated testing at the math lib level but not at the expense of modular testing. I can’t recall any prim functions working under prim but breaking under rev… @syclik has that been an issue?

Realistically the burden of adding interesting functionality to the math lib is not in the test harness but in finding reference values manually and in the number of combinations that have to be tested. That needs automation. @Bob_Carpenter made really good progress on doing that for a subset of gradient calculations and it unearthed a lot of inconsistency that we have addressed over time. We can automate a lot more of that and then testing under different headers becomes a minor issue.

Rather than trying to quickly limit the math lib API it might be more helpful to move to specifying what math lib API Stan relies on (it certainly isn’t the internal stuff) and we could make sure there are tests that cover it. It would be revolutionary in terms of stability if we had a set of small known good Stan models that were verified prior to each release and if their math lib internals were thoroughly tested.