Recently, I have gotten access to the Pittsburgh supercomputer Bridges (https://www.psc.edu/bridges) for running large scale Bayesian computation problems.
In my work, I have developed a custom R package that implements various Bayesian inference strategies for fitting network models. Some of my strategies are implemented as pure rstan code, others are Rcpp code relying on autodifferentation via stan’s math library, and others are simply implemented in Rcpp.
After getting everything up and running on the Bridges supercomputer, I ran a few experiments. I was surprised to discover that any of my algorithms that relied on autodifferentation (through rstan or the stan math library) were running more than 5 times slower than expected, relative to those that algorithms didn’t rely on autodiff.
To elaborate, any algorithm that didn’t use autodiff took roughly 1.5x as long on the supercomputer than on my department computing cluster, whereas anything requiring autodiff took at least 8x as long. I’d rather use the supercomputer because it allows me to run far more experiments simulatenously. In both environments, I am not doing any parallelization or threading.
For those algorithms that used autodiff, swapping out the autodifferentation for handcoded derivatives fixed the issue entirely. However, I don’t know if handcoded derivatives are possible to use in rstan. As of right now, this unexpected time difference means it is much faster for me to simply use a handcoded implementation of Hamiltonian Monte Carlo without any of stan’s nice tuning features.
Has anyone encountered such an issue before? I’d love to be able to run rstan on Bridges without it using all of my computing allocation. Moreover, I worry the discrepancy might grow worse with scale of the problem.
=====
In case the following is useful, I am using R 3.6 on both the Bridges supercomputer and my department cluster.
For the supercomputer, I am using rstan 2.18.1 and StanHeaders 2.18.0.
For department cluster, I am using rstan 2.18.2 and StanHeaders 2.18.1.
In both cases, I am not using the latest versions because I encountered installation issues.
Note: I did not realize until now that the two environments are not running the exact same versions. Could this be be a possible source of the discrepancy?
Thank you for your time.