TBB and 2.21

Nope… the open PR only deals with makefiles. We leave that as is I would say. Thus, if you want to rip out the TBB, the user can define TBB_LIBRARIES to be empty and then the makefiles do not build the TBB anymore (I think).

I am working on another PR which introduces the TBB ready for use with the reverse mode. I will include if-defs there as discussed. I should have time today for some work on this… let’s see.

Sebastian

1 Like

I don’t see the value of a final pre-GPLv2 release. Is there anyone that’s going to benefit from exactly one more release? Or is there some kind of critical bug fix and some users that need that licensing?

I agree with Bob. Even if it’s easy to make it optional, since there’s no huge reason to do this, just get it in.

This was the common agreement during the last stan-math meeting. Please refer to the meeting notes from that meeting. I see your point, but given we agreed on that we go along this path at the moment…unless @syclik is fine with going all in now as you suggest.

However, I am really far… and would appreciate if now some eyes could look at what I am proposing and maybe someone can help me typing out even more tests (@rok_cesnovar is reviewing I suppose, so let him be the reviewer and not a writer of this) .

So the branch feature/intel-tbb-init branches off the intel-tbb-lib branch and it includes the following extras to check:

  • stan/math/prim/scal/fun/init_threadpool_tbb.hpp tested with test/unit/math/prim/scal/fun/init_threadpool_tbb_test.cpp … the function is doced and mostly tested. The idea is that this function must be called by the interfaces before firing off a chain. This will initialize the TBB threadpool. In case this function is not called, then the TBB initializes itself with its defaults. Everything will run, but the STAN_NUM_THREADS would be ignored.

  • stan/math/rev/core/init_chainablestack_tbb.hpp adds a so-called observer to the TBB system. The observer is notified whenever a thread enters the threadpool. Upon notification the AD tape for the given thread is initialized such that we do not need to deal with AD tape initialization ourselves. As long as a TBB thread is used everything is fine. I left for debugging couts in there. This is tested in test/unit/math/rev/mat/functor/gradient_test.cpp. The gradient test would fail if the AD tape would not have been setup.

So please have a look, @rok_cesnovar, at this. If you approve this design I will finish it up and I take any hand who wants to help me on that (there are some todos in the code).

Once this is done, then refactoring the map_rect will be very easy:

  • we will just switch to use the tbb::parallel_for… and that’s it for stan-math
  • as a second thing we need to make within cmdstan a call to the stan::math::init_threadpool_tbb() function which will initialize the TBB threadpool with the right number of threads => all interfaces have to do this in the future to get STAN_NUM_THREADS as maximal concurrency; otherwise the TBB default will be active (all cores available).

It’s too late for me to do an issue + the PR for tonight… but all in all we are on track for 2.21 unless the design is not good to your eyes, so please let me know what you think ASAP. Thanks.

Sebastian

I’ll take a look at this tmrw. May be good to open up a draft or WIP PR for discussion

I will take a look tomorrow before noon. Too little focus to do it thoroughly right now.

Judging just from your post it seems its what we agreed at the Math meeting (discussion around a helper function).

EDIT:

Quick comment not on the general design: I would prefer not using getenv. In this case I would prefer ifdefing and just assigning a default value when the define is not present. That way you also dont need to do the lexical parse as this becomes a compile time error.

I am not sure if I understand your comment. How do you suggest should STAN_NUM_THREADS be honored if you do not want to use getenv? Should the interfaces do that?

Yeah, never mind, its a environment variable not a define like STAN_THREADS. My brain just jumps to the conclusion that its a compile time define whenever I see STAN_*. I will call it a day. Start with fresh eyes tomorrow.

Ah…ok…will open a or tomorrow…maybe some fresh us eyes look while we sleep…

WIP PR is open, see below.

@rok_cesnovar I did some simplifications compared to yesterday.

Things to consider design wise from my view are listed in the Todo section of the PR. More discussion there, I suppose.

Thanks for letting me know—I hadn’t seen that. I wasn’t trying to open an already closed decision.

1 Like

Here are the notes from the 9/19/2019 meeting.

There are a few good fixes in this release. But no, there wasn’t a real big value in a final pre-GPLv2 release, which was why we were ok either going forward with this or not. @wds15, I don’t think the decision was properly reflected: we are ok including TBB with the release if we can make use of it by the code freeze date. If we can not, then we’ll have it with part of the next release. (We don’t need a final pre-GPLv2 release.)

Ok. I think we are basically there. I will then take away those additional if-def guards which right now make the TBB only mandatory with STAN_THREADS. There are a few if-defs which can go away then.

It looks like we have it close to ready - with 9 more days we are good by the time relatively sure.

Hi @syclik!

Just to make sure this doesn’t pass by without notice: As the TBB is now fully integrated into Stan-math, this turns Stan-math into a non header-only library! The TBB must be linked against in any circumstances (with/without threading). Doing so allows for the cleanest design.

The doc for this updated in the places where I came across.

It may appear unnecessary at the moment, but we can only by always linking in the TBB we can get around pouring quite some ifdefs into the code base which we decided against. Since we will almost for sure end-up with threaded warmup in the very near future, I would anyway expect that we will be in a threading-is-the-default world very soon. Once we are there, the TBB facilities are in heavy demand for any Stan program.

Sebastian

Sundials beat the TBB to the punch in making Stan math require linking.

And like Sundials, it looks like I didn’t need to do anything on my end on OS X—just pulling the latest seems to work.

Well…the difference to sundials is that if you do not use sundials functionality in your Stan program, then things still work even without linking to sundials. This is why pystan can get by without linking the sundials libraries. With the tbb this is different. For any Stan program you must link the tbb or it won’t work. That’s new.

And I spend a lot of time on making everything smooth.

Why would it require to run a job that doesn’t utilize threads? Is that necessary?

Yes…you need the tbb for all programs. It makes life for us developers a lot easier.

Edit: and as we will likely have threaded warmup in the very near future, I very much hope so, we will anyway live in a thread only world once we are there…

1 Like

@wds15, thanks for tagging me. I think this means we bump the major version. I think @seantalts is supposed to make the call, but after we make a suggestion.

What do you think? This seems like it changes what we used to be as the Math library. We’re no longer a header-only with optional link library for certain things. And… we have a new library dependency. If you’re using Math and are GPLv2 and distribute the code, then you will be in violation after this new release.

This is the API we’re using for semantic versioning: https://github.com/stan-dev/math/wiki/Supported-external-API

It doesn’t seem to say much about header only vs. requiring libraries, but if you want to bump Math to 3.0 or rename it to something else, that’s fine with me.