I think we could break down what you wrote about your goals into two discrete things with different action items:
- Further improve performance of the CPU GLMs
- Ascertain best practices for improving performance more generally (specifically benchmarking)
You mention both but I think you are actually trying to focus on the 2nd one, right? We can focus on finding ways to microbenchmark that seem to in some cases reproduce end-to-end performance, but I think at the end of the day when it comes time to make a decision we’re still going to want to see end-to-end stuff just in case, probably?
Here is my end-to-end graph of the normal_id_glm_lpdf
model of develop
vs the version with check_finite
cut out as above, n=5000, k=5000 (there’s not really a need for a graph here but whatever):
Blue is obviously the one with the checks in.
PS This is what Chandler Carruth used for disabling optimizations (instead of just volatile
), I wonder if it has different semantics.
static void escape(void *p) {
asm volatile("" : : "g"(p) : "memory");
}
static void clobber(void *p) {
asm volatile("" : : : "memory")
}