Throughout the late Summer and Fall, I have been working on a large simulation study evaluating a few candidate hierarchical models for an applied problem (essentially a hierarchical meta-analysis). I have been running my simulations on a cluster – my University’s center for high performance computing (HPC). Everything was going well until mid-November when there was some scheduled cluster downtime. After that downtime, I was no longer able to fit models using rstan without error. A staff member of our HPC was able to help me install a version of rstan so that I would not be getting such errors. However, the sampling for my models is now so incredibly slow that I am now stuck again. This is without having made any substantial changes to my r or stan code. On my own laptop, for example, fitting one of my models to a simulated dataset, I am able to run three chains in parallel for 100,000 iterations over about 2 hours. On any core in the HPC environment that I get my jobs submitted to, even 12 hours will only get me through 20% of the sampling for one chain. I am wondering if anyone can help me figure out what has changed/is going wrong. I am not a developer so it’s difficult for me to understand why the same r/stan code would have worked a few months ago, and is not now.
The instructions I received from our HPC staff were the following (I implemented these instructions). As a starting point, does anyone recognize issues here?
“I have this working. I installed rstan version 2.21.1 into our gcc-compiled version of R 3.5.1 along with its dependencies and successfully ran both your SLURM scripts last night. Here is how you can install and use is:
- Your $HOME/.R/Makevars file should contain the following:
CXX14FLAGS += -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION
CXX14FLAGS += -fPIC
LDFLAGS += -fPIC
If you have other settings in that file that you want to keep, either make a copy of the file, or comment them out with a # character.
You can execute my install script which is ~ID/Incidents/YourName_rstan/install_rstan.sh . That is a bash script that will create the directory R/mropen under your home directory and install rstan with its dependencies there. The installation is performed with our copy of Microsoft R version 3.5.1, which I chose because it is compiled with gcc. I couldn’t get this working with any of our copies of R compiled with the Intel compiler.
In your csh scripts like Analysis.sh or Simple_Analysis.sh you must add the lines:
module load mropen gcc/4.9.2
setenv R_LIBS_USER $HOME/R/mropen”
If there are no obvious issues with this setup, I can provide my r and stan code, or of course any additional detail that might be helpful. I will note that all cores that are being utilized have CPUs at 100% and there is low memory use, so it does not appear as though the cluster isn’t working hard. Thanks for any help or suggestions!