Buying a new computer - best hardware for fast Stan performance

I looked at this a while ago (disclaimer, I have not looked at it since v2.16), and here is what I found using a fairly complicated model on large data:

For my model with some large matrix operations, the fastest speedup was from getting on a system with lots of L3 cache on the CPU(s). I ran the same model on a bunch of different hardware and the slowest speeds were from the system with the highest clock speed - a consumer-grade workstation (i7 @ 3.8 GHz, 8MB L3 cache - 4 days). Middling times were on some compute servers (Google Cloud, campus clusters) with decent clock speeds, but I was sharing with other jobs (~3.2 GHz - 2 days). The fastest times were on a system I cobbled together with used server parts off Ebay. That system has the lowest clock (and oldest CPUs) but highest L3 cache (dual Xeon @ 2.6 GHz, 25 MB L3 per chip - 20 14 hrs). edit - just checked and the run-times were lower than I remembered

Hardly authoritative, but from my experience running other models since then, Stan benefits greatly from large cache on the CPU and high memory bandwidth. My low clock, high memory bandwidth “frankenstein” workstation is my go-to for running Stan. I would guess that the newer AMD Threadripper CPUs in a workstation could be a great value, as they have gobs of cache. I will get a chance to play with one in the coming months and will report back.

5 Likes