Using Stan on a computing cluster. Any advice?

I do not have experience with PyStan and on the cluster available to me I use CmdStan.
However, I do not see particular benefits in terms of speed: I just delegate a cluster to run my chains avoiding to hang my PC at work. However figuring out how compiling stan on the cluster took a a bit of work…
I know that there is a lot of improvement in the “parallel Stan’s universe” (map_rect, GPUs) with a lot of work in the math library, but this I still can not understand how these new opportunities work.

1 Like