Separate compilation of model and services code complete

Bob_Carpenter · July 6, 2019, 8:38pm

For instructions on how to run it and verify for yourselves, see the comment on this issue:

https://github.com/stan-dev/cmdstan/issues/712

I can’t measure a loss in sampling speed on a model that takes 1s to run 2000 iterations.

It’s 100% backward compatible with existing interfaces, so it won’t break anyone’s code in any repos. RStan and PyStan will need to be updated the same way as CmdStan.

The underlying pattern is a bit complicated, as it involves a layering of dynamic and static inheritance adapters to maintain backward compatibility and minimize code. But I didn’t have to touch any of the services or algorithms code and it’s a minimal one-line change to cmdstan C++ code. (Figuring out how to do it on the other hand, was much harder.)

The C++ magic’s all documented method-by-method in the stan-dev/stan branch. Here’s the synopsis:

Translation Unit 1

stan::model::model_base (abstract base class)
- untemplated virtual functions for log_prob and write_array (required to call from reference to model_base)
- templated log_prob() and write_array() signatures matching our originally generated functions (required to allow old code to simply call model_base)

stan::model::model_crtp (abstract base class extending model_base)
- untemplated virtual functions implemented in terms of templated functions using crtp (this means no extra work for code gen and no runtime branching on jacobian/propto)

stan::model::my_model: user-generated model extends model_crtp<my_model>
- this is generated model code (same templated log_prob and write_array as before)
- different superclass
- added function to construct and return model_base (this gets used by CmdStan rather than the typedef)

Translation Unit 2

cmdstan/command.hpp
- remove template from command function
- add signature declaration for function to allocate and construct base_model (this gets defined in model translation unit; see above)
- construct model with this function instead of template type

The extra good news is that adding higher-order log_prob increases compilation time only about 5%.

I still need to add more doc and tests before creating PRs. Any suggestions for testing would be appreciated.

Mitzi’s going to tweak the makefiles.

I’ve completely documented the Stan bits of this and updated the C++ code generator to generate the right code.

jpritikin · July 6, 2019, 8:59pm

Wow, this is epic! How small are models compiled with -g0 (no debug info)?

mitzimorris · July 8, 2019, 4:43am

tweaked the CmdStan makefiles - specifically cmdstan/makefile, cmdstan/make/program

ahartikainen · July 8, 2019, 11:08am

@ariddell do you think this will work out-of-the-box for PyStan?

ariddell · July 8, 2019, 2:04pm

Not out of the box. It’ll take some work. PyStan uses Stan C++ headers
only now as this simplifies just about everything.

Unfortunately Python’s distutils / setuptools doesn’t support building
and installing C libraries. There’s numpy’s install_clib but it’s poorly
documented. I believe we used that at one point for something.

In short, it’s possible. Will take some distutils/setuptools magic though.

ahartikainen · July 8, 2019, 3:19pm

I think this is the same problem as we have with cvodes.

Bob_Carpenter · July 8, 2019, 6:35pm

Yes, same issue—building and linking multiple translation units.

The parser/code generator does this, but it’s all encapsulated in a single executable that doesn’t need to change on the fly.

betanalpha · July 8, 2019, 6:37pm

I benchmarked a more expensive model with the sampling iterations bumped up to get a more stable estimate of the overall performance.

On develop,

> time make build

real	3m33.639s
user	3m17.187s
sys	0m13.486s


> time make CC=clang++ -j4 O=3 linear_regression_finnish_horseshoe

real	0m26.621s
user	0m26.468s
sys	0m0.824s

> time ./linear_regression_finnish_horseshoe sample num_samples=25000 data file=linear_regression.data.R random seed=48383884

real	15m32.367s
user	15m23.334s
sys	0m7.174s

on the feature/0712-model-base-class branch,

> time make build

real	3m49.410s
user	3m34.290s
sys	0m12.688s

> time make CC=clang++ -j4 O=3 linear_regression_finnish_horseshoe

real	0m15.038s
user	0m14.992s
sys	0m0.770s

> time ./linear_regression_finnish_horseshoe sample num_samples=25000 data file=linear_regression.data.R random seed=48383884

real	15m11.413s
user	15m3.867s
sys	0m6.328s

Make build is maybe marginally slower but the model execution time exhibits no significant differences. That said, the speed up in compilation is less than 50%.

Bob_Carpenter · July 8, 2019, 6:40pm

They’re not huge. bernoulli.o, the very simplest model, is under 500K compiled at O=3. But that isn’t with -g0. Size will vary by model based on how much of the math library needs to be linked in.

@mitzimorris will know more as she’s dealing with the makefiles for cmdstan.

betanalpha · July 8, 2019, 8:16pm

I get 1.7 MB at -O3 under both develop and the feature branch.

mitzimorris · July 8, 2019, 11:46pm

~~I hope the problem is with my incomplete understanding of the makefile rules. because I’m only seeing a 50% speedup in compile time as well.~~

good news - it was my incomplete understanding of our makefiles. for clang compilers, using the precompiled model headers (stan/model/model_headers.hpp.gch) brings the compile time for the bernoulli model down from 23 seconds to 7 seconds.
slightly larger model (bym2.stan) goes from 28 seconds to 9 seconds.

betanalpha · July 9, 2019, 12:39am

With the change, git pull -ff on both cmdstan and stan (it would be great to have the submodules synced), the compile time goes down a little bit to

> time make CC=clang++ -j4 O=3 linear_regression_finnish_horseshoe

real	0m11.237s
user	0m10.531s
sys	0m0.502s

about 45% of the develop compile time relative to

> time make CC=clang++ -j4 O=3 linear_regression_finnish_horseshoe

real	0m26.621s
user	0m26.468s
sys	0m0.824s

on develop.

I didn’t test run time.

Bob_Carpenter · July 9, 2019, 9:36pm

On my machine, the current timing is:

1a) 7s: build model using makefile starting from .stan file assuming everything else is already built
1b) 6s: same as (1a) but calling commands directly rather than involving make dependencies—will need more help on make to remove this 10% overhead if possible.

2a) 26s: current system build using makefile
2b) 24s: same as (2a) but calling commands directly rather than invoking makefile

I don’t know where I got the 37s number from—maybe I wasn’t invoking the precompiled header properly.

Anyway, this one’s ready to review now.

bgoodri · July 10, 2019, 4:59am

I think I’ll be able to figure this out. We already have to link against the StanHeaders shared object to use SUNDIALS and we already have a base class in RStan, so it is probably just a matter of redoing some of the C++ patterns.

Bob_Carpenter · July 10, 2019, 6:16pm

That’s exactly what it looked like in CmdStan. @mitzimorris is the one to ask for make help.

bgoodri · July 10, 2019, 6:17pm

OK, we’ll talk about it tomorrow.

Topic		Replies	Views
Separating model and services translation units for faster compilation Developers	6	683	June 24, 2019
Why is it so slow for stan to compile model? Modeling	7	5594	January 3, 2020
Difficulty modifying Stan algorithms with cmdstan Developers	5	472	September 3, 2019
Quick Pystan question: Can I change the generated C++ code? Interfaces pystan	3	912	May 23, 2018
Compilation time evolution in cmdstan Developers	18	1443	July 20, 2021

Separate compilation of model and services code complete

Translation Unit 1

Translation Unit 2

Related topics