CmdStan and high throughput computing

tedward · September 3, 2019, 6:28pm

I recently participated in the Open Science Grid User School. For the final assignment, I got Stan running on the University of Wisconsin’s high throughput computing cluster, which uses HTCondor to manage jobs.

I thought I’d just put it out there in hopes that it might be helpful to other beginners who want to run Stan on a cluster, but like me aren’t super savvy with setting everything up.

Any suggestions on improvements are welcome.

yizhang · September 3, 2019, 8:56pm

Great! What characterizes the “high throughput” in this example?

tedward · September 3, 2019, 9:05pm

In this case it boils down to running many models and the chains of those models in parallel. More generally, I’ll quote the HTCondor website:

For many experimental scientists, scientific progress and quality of research are strongly linked to computing throughput. In other words, most scientists are concerned with how many floating point operations per month or per year they can extract from their computing environment rather than the number of such operations the environment can provide them per second or minute. Floating point operations per second (FLOPS) has been the yardstick used by most High Performance Computing (HPC) efforts to evaluate their systems. Little attention has been devoted by the computing community to environments that can deliver large amounts of processing capacity over long periods of time. We refer to such environments as High Throughput Computing (HTC) environments.

yizhang · September 3, 2019, 9:07pm

Yeah, that’s exactly what I’m wondering. Sorry that I should have been more specific: how long does the example take to run and how it scales?

tedward · September 3, 2019, 9:23pm

I haven’t done a thorough test but here are some example numbers.

sampling of bernoulli example: around 8-9 seconds
transfer files and compile model: around 3-5 minutes
wait for matching between job and execute nodes: around 2 minutes (can take 5 minutes).

The overhead associated with matching the jobs and compiling the model on each execute node only makes sense if the models take a long time to sample (tens of minutes to hours) and if there are many models worth fitting.

increasechief · September 4, 2019, 5:07pm

I imagine that the R HPC mailiing list (https://stat.ethz.ch/mailman/listinfo/r-sig-hpc) would also find this interesting.

Bob_Carpenter · September 9, 2019, 1:08am

Thanks!

Topic		Replies	Views
Cmdstan cluster sampling speed CmdStan	3	67	January 10, 2025
Stan on computing cluster: strange results CmdStan	11	1597	June 8, 2018
Running cmdstanr in parallel on computing cluster General	6	930	December 9, 2022
Stan 2.20.0 released! Announcements	3	1517	July 19, 2019
Rstan on remote servers General	9	1918	December 14, 2020

CmdStan and high throughput computing

Related topics