Why is our math library header-only?

Was just brainstorming today with Bob and thinking about ways to make development on Math easier, and it just occurred to me that even within the realm of C++ development things are a little bit difficult (like when you’re adding one function across 12 files and trying to compile all of those tests and sometimes they each use their respective “entire world” headers (like scal.hpp, mat.hpp etc). Pre-compiled headers help a bit but doing a more normal separation of headers and implementation would probably help a lot more, in addition to helping out with compile times of models (right)?

Posting this here because I’m sure I’m missing something.

Our math library is (mostly) header-only because we can’t instantiate enough of the Stan language to be worth it. I tried to instantiate all of the normal distribution (with double and var with the vectorization) and it ended up creating a .o file that was on the order of GBs. And that didn’t have any of the other distributions.

The last time I thought about it, I thought it would be possible to instantiate the traits, but I never tried it.

I’m not quite sure what you’re asking. Perhaps you’re thinking along the lines of what Bob did for stanc?

Ah right, too many template possibilities. Blah.

like this http://www.math.uaa.alaska.edu/~afkjm/csce211/handouts/SeparateCompilation.pdf

What did Bob do for stanc?

yeah… that won’t work for how the Math library is used from within the language. I mean, we could split the declaration and the definition. I think the only reason we didn’t do that is to make it easier on developers.

If you look inside Stan’s src/stan/lang/grammars, you’ll see three files that go together. For example:

It’s split into declaration, definition, and template instantiation. We could do the first two for the Math library, but the third is where we ran into trouble before. Maybe there’s a better way to do it.

Perhaps we can instantiate all the traits. That might cut out some time, but I’m not sure.

Well, for now I’m trying to wire precompiled headers through and the unit tests go from 28 minutes on the mac pro to 16 minutes with them! :P

What are the traits you’re talking about instantiating?

Awesome! That’s a big win and something we couldn’t do until we did the big includes in each of the tests. We used to include just what we used and that didn’t allow us to take advantage of the precompiled headers (because every test included something different).

Everything in the meta directory (e.g. stan/math/prim/scal/meta/*). I don’t think we have so many template metaprograms that we couldn’t instantiate the full set of the ones we’d need for reverse-mode autodiff with Eigen. (Maybe we do.)

Now we have a shape division

scal, arr, matrix

and autodiff division

prim, rev, fwd, mix

I’m OK collapsing scal / arr / mat, which cuts down the number of directories from 12 to 4.

The autodiff division could probably be reduced to rev and mix, as those are the two configs we’ll be using in practice (other than for standalone user-defined functions in RStan).

We define code in fwd rather than mix since we just plug in rev as the template parameter in fvar to get the mixed mode. So mix is more about instantiation and testing. I’d suggest maybe renaming them if we break into just two subdirectories.

2 Likes

Now we’re talking! I like this shape collapse a lot as a first step. I volunteer to take a crack at it in the coming weeks if no one else wants to.

+1!

But, before getting too deep into this, mind if we reach out to the users first to see if anyone still needs the prim vs rev distinction (it’s much cheaper to compile against prim) and whether we have issues compiling with matrix in Windows / PyStan.

Once we merge everything, it’ll be hard to get back to separate, so I’d prefer to make sure we know what affect this has on our users before we do it and can’t really recover easily.

Sure, go ahead - I wasn’t talking about collapsing prim and rev yet - just the shape ones above.

cool. I’ll start with asking that on the users side.

Are we not going to do this if a user responds they’re using the primitive functionality? It’s not like that’d go away, it’d just be rolled into a larger header file.

1 Like

We could do this for everything in the math library other than the probability functions. In particular, I think instantiating all the matrix code would be feasible.

We’re talking a pretty substantial effort here. I had to do it for the parser because Windows toolchain couldn’t handle the header-only form of Spirit Qi for grammars our size.

If the difference in time to compile is minimal, we can just do it. If there’s a significant difference and we have a lot of users that rely on it, then I’d just like to have a conversation before we make a decision either way. I’ll time it tonight.

We had originally split apart the dependencies this way by request. I’m not sure if the original requestors still have this need or if there’s a different set of users that are using it. Just wanted to get a feel for impact before making changes that are hard to back out.

The Google style guide is very much a dynamic document; my asking about what we will do when it changes wasn’t just to prolong the discussion about “style” but to lay down policy for the inevitable so the discussion doesn’t recur.

Here’s an amusing rant with some examples of when things changed (though not the specifics of the changes):

https://www.linkedin.com/pulse/20140503193653-3046051-why-google-style-guide-for-c-is-a-deal-breaker

It also sums up why I think Google style guide is broken on the semantics front (assuming they still have the no exceptions [code like it’s C99] and no references [second verse, same as the first]).

Much of the semantics in their style guide we can’t comply with because of the nature of reverse-mode autodiff (at least until someone figures out how to do it without static globals or without implicit constructors).

Hah. You were right, they are changing it all the time and not even versioning it anymore. For the record, I don’t care at all what style guide we use as long as we can automate a bunch of it and it’s some kind of standard. I did like the Google one better than llvm and mozilla.

Back to the includes thing, I don’t think we really owe our end-users compile-time-based features (or whatever you call them) that basically take time away from Stan developers in order to give it to some specialized use-cases… I think that view is short-sighted in that ultimately they’d rather have more people working on Stan even if their compiles took a little longer.

=). I wish I had your conviction. Would have saved us a lot of trouble.

Sure, we don’t owe users anything. We’ve just been on the conservative side of changing behavior. Perhaps too conservative. Perhaps not.

I just timed building a matrix test (inv_wishart_log_test) within prim with the prim header and the rev header. Ran each 5 times: prim 8.9s (0.23s sd) vs rev 10.6s (0.46s sd). There’s clearly some difference, but not sure it matters enough. So I’m happy if we collapse that into rev.

I agree, but it’s painful when it goes up.

For me, this is more of an issue of which headers to include than a deeper backward compatibility issue. We haven’t been very good about backward compatibility for the math lib, to say the least. We’ve been much better about the Stan language itself.

I think those timings are going to vary significantly based on disk speed since you’re reading in much more code. It’s all just reading and dismissing what you don’t use, but our headers include a gazillion other headers, and that’s all file handles opening and closing and reads happening.