Is there any option inside CmdStan to profile the execution of a model? Any easy way of seeing which parts of AD are the bottlenecks etc.
If there is not, does anyone have any ideas if this is possible at all and how?
So far the best solution I have is simply brute forcing a bunch of time measurements and prints in all the functions that I presume could be bottlenecks. I also measure the time of one transition (here) and if the time execution times of the functions add up to the execution time of the transition or something close to it, I know I have found all of them.
I’ve done some stuff with google-perftools before: Adjoint sensitivities
https://github.com/gperftools/gperftools and for Ubuntu https://github.com/ahorn/benchmarks/wiki/Profiling-with-google-perftools
You gotta beware of inlining and optimization making the stack traces look be misleading, but I thought it worked pretty well. You can use google-perftools with about anything though, so a Stan binary should be fine.
This thread has some generic suggestions, but I know you were seeing all of the virtual
chain() calls being identified as the same call and I’m not sure how that works. I was seeing that with
-g (in this post in that thread) that it was identifying at least which class’s chain methods were being called most.
Thanks to both of you. I will report back.