CmdStan and high throughput computing

I recently participated in the Open Science Grid User School. For the final assignment, I got Stan running on the University of Wisconsin’s high throughput computing cluster, which uses HTCondor to manage jobs.

I thought I’d just put it out there in hopes that it might be helpful to other beginners who want to run Stan on a cluster, but like me aren’t super savvy with setting everything up.

Any suggestions on improvements are welcome.

4 Likes

Great! What characterizes the “high throughput” in this example?

In this case it boils down to running many models and the chains of those models in parallel. More generally, I’ll quote the HTCondor website:

For many experimental scientists, scientific progress and quality of research are strongly linked to computing throughput. In other words, most scientists are concerned with how many floating point operations per month or per year they can extract from their computing environment rather than the number of such operations the environment can provide them per second or minute. Floating point operations per second (FLOPS) has been the yardstick used by most High Performance Computing (HPC) efforts to evaluate their systems. Little attention has been devoted by the computing community to environments that can deliver large amounts of processing capacity over long periods of time. We refer to such environments as High Throughput Computing (HTC) environments.

Yeah, that’s exactly what I’m wondering. Sorry that I should have been more specific: how long does the example take to run and how it scales?

I haven’t done a thorough test but here are some example numbers.

  • sampling of bernoulli example: around 8-9 seconds
  • transfer files and compile model: around 3-5 minutes
  • wait for matching between job and execute nodes: around 2 minutes (can take 5 minutes).

The overhead associated with matching the jobs and compiling the model on each execute node only makes sense if the models take a long time to sample (tens of minutes to hours) and if there are many models worth fitting.

I imagine that the R HPC mailiing list (https://stat.ethz.ch/mailman/listinfo/r-sig-hpc) would also find this interesting.

Thanks!