I am trying to do some profiling to identify bottlenecks (pertaining to LLC memory, autodiff, etc) in Stan. However, the datasets I am using from Stancon 2017 and stancon 2018 were all written for using Rstan. I found this post Profiling RStan Code with Apple Instruments or Similar that mentions that we are better off with cmdstan.
However, given that I am using some tools such as perf and cachegrind, does anyone know how to profile projects in stancon2017 (stancon_talks/2017/Contributed-Talks at 9b18c6c435fba54b25e1799f543bdcde92090e94 · stan-dev/stancon_talks · GitHub) ?
Bottlenecks are generally going to be pretty specific to individual models and the stepsize forced by the posterior geometry (so both model and data). In general that absolutely swamps any lower level implementation issues. However, lower-level problems do come up (say in auto-diff or how the auto-diff templates work out, or specific approximations) but for finding those it’s better to work with the layer in question directly (math lib and unit tests) … Sorry I don’t have an answer to your direct question but just pointing out you might try a more targeted approach or ask a broader question.