Have you been using some of the latest features of Stan?

klattery · November 1, 2021, 6:59pm

We rely heavily on reduce_sum. Prior to reduce_sum, our standard models would take about 4-5 days. This was too long for our modeling production cycle. So we kept using custom Gibbs samplers instead (1-2 days). Now with reduce_sum running 8 threads per chain, we see about 5x speedup over the non reduce_sum. Our sampling is now slightly faster than our Gibbs samplers and Stan converges much better. We get slightly faster estimations with 12 threads and for larger problems with 16-20, but rarely much after that. And of course too many threads eventually is slower. We have also tuned the grainsize faster than the Stan auto-default. For our standard problems, each thread runs about 4 loops for one iteration (8 threads = 32 data chunks).

All this is based on desktops with AMD threadripper 3970X (WSL), and AWS compute optimized instances (c5.4xlarge) running cmdstanr. I can’t thank Stan developers enough for reduce_sum. It took Stan from a non-production nice idea to a practical tool.
https://discourse.mc-stan.org/t/cmdstan-2-23-release-candidate-is-available/14301

Topic		Replies	Views
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2119	June 6, 2020
Stan on computing cluster: strange results CmdStan	11	1602	June 8, 2018
Stan significantly slower after incorporating multithreading? CmdStan paralellization	6	845	May 3, 2023
Multilevel hurdle model -- no performance increase with reduce_sum() Modeling performance	5	540	September 12, 2021
30%-40% drop in sampling time using stanc O1 optimizations General	22	1703	September 1, 2022

Have you been using some of the latest features of Stan?

Related topics