My workstation was not busy today so I made it perform some performance tests for Cmdstan releases from 2.18 on.
The first results I have are for compilation times and we seem to be regressing a bit since CRTP was added in 2.20 (see chart below). Is this just added complexity or something worth exploring?
yes, they should be run after the feature freeze compared to the previous release to identify any issues, and if compile times have improved mention that in the releases notes. We should already do this for this release as there should be a noticeable speedup (20% shorter compile times for clang and 50% shorter compile times for g++ the last time I checked).
Ideally, I envisioned running a larger suite of performance tests (compiling+running posteriordb models for example) once a week (probably over the weekend when the resources are mostly free) and make visual reports to a predefined issue.
There are some more issues/PRs in the pipeline before the above will be realized, but we are getting there.
I ask because I was getting a feel how of cmdstanpy works and was curious how results for the eight-schools example varied when one varies sampling parameters. It’s not hard to store the compile time for each attempt.
I’ve got ~50 pooled workstations at my disposal, so…
For each compiled model, the information one can record
Total compile time
CPU boost/turbo frequencies
CPU chip-set name
Memory usage
OS version
C++ compiler version
presuming one is using just a single core for compilation. Presumably the compile time and memory usage will have very narrow variance, but it would be good to check. Particular on different platforms.
There are two steps to compilation. First, the transpilation from Stan to C++, which involves a Stan version, and then the actual compilation of the C++.
Nope. Windows C++ compilers consume way more memory than Linux or Mac OS. We’ve had to take extreme measures on the compilation to deal with that in the past.
Actual memory usage is notoriously difficult to measure in an application, but you’ll know when you run out.
The C++ compiler flags and library versions are just as important as the compiler version.
Timing’s usually broken into system vs. wall time.
Apologies for not being clear. My thought was that there would be small variance for compile time on the same system (OS, cpu, etc), not that it wouldn’t differ across platforms.
And thank you for reminding me of the complexities of testing memory and timing.
For the same OS, CPU, compiler, and compiler flags I would expect them to be around the same. But swapping in and out compiler flags can also change compilation / performance speedups by quite a lot
Usually the dimensions are a constant and don’t affect compilation time. So whther you have 1 or 1000 dimensions, the variable’s likely to be declared as vector[N] alpha and translate to a simple Eigen declaration.
There’s no universal answer here. It’s going to depend on the how the Stan program’s coded and the kind of operations it has and the kind of optimizations the compiler can do. Particularly on how much template metaprogramming is involved int the translation of the program to C++ and the amount of code that needs to be optimized and whether that can exploit low-level CPU features through underlying implementations. And it’s going to change with MPI (different compilers) and GPU, both of which have different performance profiles.
P.S. We like “Stan program” to separate the model as a mathematical object from a particular implementation in Stan.
Thanks Bob for the reply. I am thinking about these things as I try to sketch out a proposal for what configurations should go into a Stan Docker container. I understand there is no optimal solution given the breadth of research the Stan community does, but there ought to be at least some recommended (robust?) settings that most users would be happy with. While individual developers have their preferences, I guess my naive thought was that either through crowd sourcing or experimentation, we could find a broadly acceptable config set.
Naturally for those folks where the standard is #unacceptable, they would be free to make changes.