Hm, I think we’re confusing a lot of different kinds of possible test here. I’ll put an item on the agenda to talk about this Thursday at the meeting.
The manual chapter breaks things down into two kinds of efficiency:
Log density and gradient efficiency. How long does it take to evaluate log density and gradients. This is really testing a combination of our generated code and the math library.
Sampling efficiency. How quickly do we converge and mix. This is testing a combination of algorithm efficiency and code efficiency when done naively.
The concerns can be isolated. We can test code efficiency of our generated code by keeping the same math lib version. We can test the efficiency of the math lib by keeping the generated code the same. We can test efficiency of sampling by keeping the code efficiency constant.
We can test end-to-end, too, but I don’t see much point in that if we’ve tested all the pieces.
I think that @seantalts is considering more holistic performance issues, for example if when everything it put together the executable is too big for reasonably sized caches and there’s a performance hit or other memory issues.
While I don’t disagree that such optimizations may be relevant eventually personally I would be surprised that this is a dominant factor now, and if it were significant that we could find optimal tunings across all of our different hardware/toolchain targets.