Project Idea: Benchmarking Framework for Individual Functions in stan/math

In MCMC algorithms, many functions at a lower level are called repeatedly. The way one implements something can largely affect runtime, in particular in C++ since typing matters (creating unnecessary copies, others might have comments). But if we’re calling functions thousands or millions of times, runtime can accumulate making MC algorithms take longer. I just saw a recent pull request that made a Gaussian process regression model take much longer, and I think not just computational noise, since runtime was up 7x, and this was due to a typing difference. This is important, since the main bottleneck of GP models is Cholesky decomp, and typing differently, although possibly reducing technical debt and labor of devs, can possibly increase runtime given that these functions are called 4-5-6 figures at a time in one model run.

I also experienced this myself, when implementing just a simple parallelization of prim/fun/exp, although both runs passed unit tests, one implementation was slower (whoops!).

One possible mitigator would be to incorporate C++ or stan/math level testing the performance of each typing change. I think Steve Bronder has a repo stan-perf, which I (honestly) haven’t set up, might be able to do this or Steve might have already set this up.

I’m thinking about like, the combinatorial (wrong? idk) explosion of using functions that are used repetitively throughout the Stan math library, and how much this factors into final product runtime.

Thoughts?

~ Regards