For instructions on how to run it and verify for yourselves, see the comment on this issue:
https://github.com/standev/cmdstan/issues/712
I can’t measure a loss in sampling speed on a model that takes 1s to run 2000 iterations.
It’s 100% backward compatible with existing interfaces, so it won’t break anyone’s code in any repos. RStan and PyStan will need to be updated the same way as CmdStan.
The underlying pattern is a bit complicated, as it involves a layering of dynamic and static inheritance adapters to maintain backward compatibility and minimize code. But I didn’t have to touch any of the services or algorithms code and it’s a minimal oneline change to cmdstan C++ code. (Figuring out how to do it on the other hand, was much harder.)
The C++ magic’s all documented methodbymethod in the standev/stan branch. Here’s the synopsis:
Translation Unit 1

stan::model::model_base
(abstract base class) untemplated virtual functions for log_prob and write_array (required to call from reference to
model_base
)  templated
log_prob()
andwrite_array()
signatures matching our originally generated functions (required to allow old code to simply callmodel_base
)
 untemplated virtual functions for log_prob and write_array (required to call from reference to

stan::model::model_crtp
(abstract base class extending model_base) untemplated virtual functions implemented in terms of templated functions using crtp (this means no extra work for code gen and no runtime branching on jacobian/propto)

stan::model::my_model
: usergenerated model extendsmodel_crtp<my_model>
 this is generated model code (same templated
log_prob
andwrite_array
as before)  different superclass
 added function to construct and return
model_base
(this gets used by CmdStan rather than the typedef)
 this is generated model code (same templated
Translation Unit 2

cmdstan/command.hpp
 remove template from command function
 add signature declaration for function to allocate and construct base_model (this gets defined in model translation unit; see above)
 construct model with this function instead of template type
The extra good news is that adding higherorder log_prob increases compilation time only about 5%.
I still need to add more doc and tests before creating PRs. Any suggestions for testing would be appreciated.
Mitzi’s going to tweak the makefiles.
I’ve completely documented the Stan bits of this and updated the C++ code generator to generate the right code.