Separating model and services translation units for faster compilation

Bob_Carpenter · June 21, 2019, 9:16pm

The goal here is to be able to compile a Stan program in its own translation unit. My preliminary experiments show that takes about 7s on my machine, whereas CmdStan compiling the same model takes about 35s (this is after the precompiled headers are made). In order to achieve this separation, we’re going to need some gynmanstics on the inheritance side. There’s a pure virtual base class with a dynamic method, then an extension using the curiously recursive template pattern (CRTP) that provides static inheritance from code generation. The transpiler then generates an instance using the CRTP helper. CmdStan gets compiled in its own translation unit, only knowing there’s a factory somewhere to give it a reference to a base model—the implementation gets linked after CmdStan is compiled.

Linking is fast, so the overall compile time for simple models is going to be something like 7s.

Running example

Rather than showing how to do this with the actual code, I’m going to provide a complete runnable example in three files that’ll show what will go where. The method speak() is going to stand in as a proxy for the more complex log_prob() method in the actual model class. The includes would be part of Stan, the first translation unit would be code generated by the transpiler, and the second translation unit would be part of CmdStan.

The whole thing reads like a C++ type puzzle. For background reading, I’d recommend:

Wikipedia article on the CRTP.
Daniel Angel Muñoz Trejo on compilation units

`tu_includes.hpp`

#include <string>

struct model_base {
  virtual std::string speak() const = 0;
};

template <class T>
struct model_crtp_base : public model_base {
  std::string speak() const {
    return static_cast<T*>(this)->template say_something<false>();
  }
};

const model_base& new_model(int n, double y);

`tu1.cpp`

#include "tu_include.hpp"

struct foo : public model_crtp_base<const foo> {
  int n_;
  double y_;
  foo(int n, double y) : n_(n), y_(y) { }

  template <bool B>
  std::string say_something() const {
    return B ? "hello" : "goodbye";
  }
};

const model_base& new_model(int n, double y) {
  const foo* f = new foo(n, y);
  return *f;
}

`tu2.cpp`

#include "tu_include.hpp"
#include <iostream>

int main() {
  int n = 3;
  double y = 7;
  const model_base& model = new_model(n, y);
  std::cout << "model says " << model.speak() << std::endl;
}

Et voilà

~/temp2$ clang++ -c tu1.cpp
~/temp2$ clang++ -c tu2.cpp
~/temp2$ clang++ -o speaker tu1.o tu2.o
~/temp2$ ./speaker
model says goodbye

seantalts · June 21, 2019, 9:29pm

Awesome! Just to be paranoid, can you add to the example some private fields on the generated model class?

Bob_Carpenter · June 21, 2019, 9:33pm

Sure. I should’ve done that to begin with as it’s what killed things last time. I’ll just update the example above for one that works with member fields.

betanalpha · June 22, 2019, 2:33pm

Are there experiments demonstrating what, if any, consequences there are to run time? Does the CRTP avoid any virtual function overhead or have compilers gotten to the point where the overhead is negligible?

seantalts · June 22, 2019, 5:16pm

I think Bob intends to end up measuring it, but my suspicion here is that for any kind of non-trivial model it will be totally negligible since it adds that overhead on the order of once per leapfrog step and anything complicated has a log_prob that executes pretty slowly in comparison to that overhead. And it would have to add a ton of time to small models to make up for the 27 seconds in compilation time this will save someone.

betanalpha · June 22, 2019, 7:39pm

Before Stan I had my own HMC C++ code and moving from virtual functions to a template solution sped things up by over an order of magnitude, even for expensive gradients (and they were all hand written, too, which would any autodiff complications) so I’m always weary of how sensitive things can be to virtuals deep in the code. I’d love to see experiments as soon in the process as possible.

Bob_Carpenter · June 24, 2019, 11:58pm

The CRTP resolves inheritance without virtual calls. But, the pattern I’m suggesting still has a virtual layer below that, so there’ll still be a virtual function call for the log density.

Of course. We can’t introduce a noticeable slowdown with this.

Topic		Replies	Views
Separate compilation of model and services code complete Developers	15	866	July 10, 2019
Compilation time evolution in cmdstan Developers	18	1443	July 20, 2021
Stan language .def files Developers	3	380	August 8, 2019
Why is it so slow for stan to compile model? Modeling	7	5595	January 3, 2020
Stan without a compiler Developers cmdstan , pystan , rstan	1	540	September 28, 2020

Separating model and services translation units for faster compilation

Running example

tu_includes.hpp

tu1.cpp

tu2.cpp

Et voilà

Related topics

`tu_includes.hpp`

`tu1.cpp`

`tu2.cpp`