Compilation time evolution in cmdstan

rok_cesnovar · April 25, 2020, 8:47pm

My workstation was not busy today so I made it perform some performance tests for Cmdstan releases from 2.18 on.

The first results I have are for compilation times and we seem to be regressing a bit since CRTP was added in 2.20 (see chart below). Is this just added complexity or something worth exploring?

Here are mean times for the 5 models (sorry for the bad chart - I am bad at plotting).

model\version	2.18.1	2.19.1	2.20.0	2.22.1	2.23.0
arK	25.33497	27.97416	11.0281	15.34745	14.7619
sir	27.0837	29.63843	14.06118	16.58849	17.73287
gp_pois_regr	27.47137	30.45327	15.13727	17.63468	18.74881
irt_2pl	25.71007	28.88344	11.66927	14.67787	16.07967
arma	25.51895	28.08129	11.08476	14.0597	15.342

bbbales2 · April 26, 2020, 1:41am

Well if we’d gone in the other direction we would have counted it as a big breakthrough.

So I think this is worth investigating.

rok_cesnovar · April 28, 2020, 2:53pm

I got to the bottom of this (or at least a local minimum). If anyone is interested: https://github.com/stan-dev/cmdstan/issues/863

Bottom line: 2.24 should have 20-25% faster compile times for clang and 10-15% faster for g++.

rok_cesnovar · May 11, 2020, 3:13pm

One more potential improvement (50% reduce compile time) for g++: https://github.com/stan-dev/cmdstan/issues/872

mtwest · July 2, 2020, 11:52am

Would it be of any interest running order 1e2 or 1e3 compile tests to get some statistics for each release?

rok_cesnovar · July 2, 2020, 12:05pm

Hi @mtwest,

yes, they should be run after the feature freeze compared to the previous release to identify any issues, and if compile times have improved mention that in the releases notes. We should already do this for this release as there should be a noticeable speedup (20% shorter compile times for clang and 50% shorter compile times for g++ the last time I checked).

Ideally, I envisioned running a larger suite of performance tests (compiling+running posteriordb models for example) once a week (probably over the weekend when the resources are mostly free) and make visual reports to a predefined issue.

There are some more issues/PRs in the pipeline before the above will be realized, but we are getting there.

mtwest · July 2, 2020, 12:18pm

I ask because I was getting a feel how of cmdstanpy works and was curious how results for the eight-schools example varied when one varies sampling parameters. It’s not hard to store the compile time for each attempt.

I’ve got ~50 pooled workstations at my disposal, so…

rok_cesnovar · July 2, 2020, 12:23pm

Using cmdstanpy or cmdstanr to test this will definitely be the easiest. Especially compared to plain bash.

Good to know :) Will definitely make a note to tag you. Thanks!

mtwest · July 8, 2020, 12:27pm

For each compiled model, the information one can record

Total compile time
CPU boost/turbo frequencies
CPU chip-set name
Memory usage
OS version
C++ compiler version

presuming one is using just a single core for compilation. Presumably the compile time and memory usage will have very narrow variance, but it would be good to check. Particular on different platforms.

Bob_Carpenter · July 8, 2020, 7:54pm

There are two steps to compilation. First, the transpilation from Stan to C++, which involves a Stan version, and then the actual compilation of the C++.

Nope. Windows C++ compilers consume way more memory than Linux or Mac OS. We’ve had to take extreme measures on the compilation to deal with that in the past.

Actual memory usage is notoriously difficult to measure in an application, but you’ll know when you run out.

The C++ compiler flags and library versions are just as important as the compiler version.

Timing’s usually broken into system vs. wall time.

mtwest · July 9, 2020, 2:32am

Apologies for not being clear. My thought was that there would be small variance for compile time on the same system (OS, cpu, etc), not that it wouldn’t differ across platforms.

And thank you for reminding me of the complexities of testing memory and timing.

stevebronder · July 9, 2020, 6:02pm

For the same OS, CPU, compiler, and compiler flags I would expect them to be around the same. But swapping in and out compiler flags can also change compilation / performance speedups by quite a lot

mtwest · July 10, 2020, 6:48am

Would it be of value to explore that parameter space? How many dimensions would it be?

Bob_Carpenter · July 22, 2020, 8:55pm

Usually the dimensions are a constant and don’t affect compilation time. So whther you have 1 or 1000 dimensions, the variable’s likely to be declared as vector[N] alpha and translate to a simple Eigen declaration.

mtwest · July 22, 2020, 11:48pm

Exploring the parameter space of which compiler & compiler flags affect the compilation and performance for a given model.

Bob_Carpenter · July 30, 2020, 8:52pm

There’s no universal answer here. It’s going to depend on the how the Stan program’s coded and the kind of operations it has and the kind of optimizations the compiler can do. Particularly on how much template metaprogramming is involved int the translation of the program to C++ and the amount of code that needs to be optimized and whether that can exploit low-level CPU features through underlying implementations. And it’s going to change with MPI (different compilers) and GPU, both of which have different performance profiles.

P.S. We like “Stan program” to separate the model as a mathematical object from a particular implementation in Stan.

mtwest · July 31, 2020, 9:38am

Thanks Bob for the reply. I am thinking about these things as I try to sketch out a proposal for what configurations should go into a Stan Docker container. I understand there is no optimal solution given the breadth of research the Stan community does, but there ought to be at least some recommended (robust?) settings that most users would be happy with. While individual developers have their preferences, I guess my naive thought was that either through crowd sourcing or experimentation, we could find a broadly acceptable config set.

Naturally for those folks where the standard is #unacceptable, they would be free to make changes.

Bob_Carpenter · August 5, 2020, 2:08am

I’d just go with the default config of the C++ compiler used by the makefiles in CmdStan.

bearcub · July 20, 2021, 10:43pm

Hi @rok_cesnovar ,

Where could I go to find said suite of performance tests? I’ve poked around a bit in the release notes, but nothing is jumping out.

Topic		Replies	Views
Cmdstan performance evolution 2.18 - 2.23 Developers	0	579	April 29, 2020
What are the requirements for fast compilation? CmdStan performance	6	1531	June 1, 2020
Separate compilation of model and services code complete Developers	15	863	July 10, 2019
Stan Math compile time Developers	11	1063	April 17, 2017
Runtime difference between cmdstan v2.28.2 and v2.27.0 CmdStan	7	957	December 10, 2021

Compilation time evolution in cmdstan

Related topics