From fast to slow sampling on cluster after reset and older rstan version installed

Hello all,

Throughout the late Summer and Fall, I have been working on a large simulation study evaluating a few candidate hierarchical models for an applied problem (essentially a hierarchical meta-analysis). I have been running my simulations on a cluster – my University’s center for high performance computing (HPC). Everything was going well until mid-November when there was some scheduled cluster downtime. After that downtime, I was no longer able to fit models using rstan without error. A staff member of our HPC was able to help me install a version of rstan so that I would not be getting such errors. However, the sampling for my models is now so incredibly slow that I am now stuck again. This is without having made any substantial changes to my r or stan code. On my own laptop, for example, fitting one of my models to a simulated dataset, I am able to run three chains in parallel for 100,000 iterations over about 2 hours. On any core in the HPC environment that I get my jobs submitted to, even 12 hours will only get me through 20% of the sampling for one chain. I am wondering if anyone can help me figure out what has changed/is going wrong. I am not a developer so it’s difficult for me to understand why the same r/stan code would have worked a few months ago, and is not now.

The instructions I received from our HPC staff were the following (I implemented these instructions). As a starting point, does anyone recognize issues here?

“I have this working. I installed rstan version 2.21.1 into our gcc-compiled version of R 3.5.1 along with its dependencies and successfully ran both your SLURM scripts last night. Here is how you can install and use is:

  1. Your $HOME/.R/Makevars file should contain the following:
    CXX14FLAGS += -fPIC
    LDFLAGS += -fPIC

If you have other settings in that file that you want to keep, either make a copy of the file, or comment them out with a # character.

  1. You can execute my install script which is ~ID/Incidents/YourName_rstan/ . That is a bash script that will create the directory R/mropen under your home directory and install rstan with its dependencies there. The installation is performed with our copy of Microsoft R version 3.5.1, which I chose because it is compiled with gcc. I couldn’t get this working with any of our copies of R compiled with the Intel compiler.

  2. In your csh scripts like or you must add the lines:
    module load mropen gcc/4.9.2
    setenv R_LIBS_USER $HOME/R/mropen”

If there are no obvious issues with this setup, I can provide my r and stan code, or of course any additional detail that might be helpful. I will note that all cores that are being utilized have CPUs at 100% and there is low memory use, so it does not appear as though the cluster isn’t working hard. Thanks for any help or suggestions!

What version of RStan was the cluster running before the change and what version is it running now?

While I don’t think it’s responsible for the full slowdown, the Makevars file is omitting compiler optimisations which will result in a slower model. Try updating the CXX14FLAGS to:

CXX14FLAGS += -fPIC -O3 -march=native -mtune=native

Prior to the change, I had to reinstall rstan a few times to keep things working over the course of August-October. The most recent version I had used was 2.21.0, but if there had been updates to rstan during that timeframe, I had also used others prior to 2.21.0 successfully. I believe I was wrong to say that the HPC staff helped me install an older version. We are now running 2.21.1.

If I add these CXX14FLAGS to my makevars file I end up getting the same issue I was having before the HPC staff helped me re-install rstan. The short version is that once the chains begin sampling, they produce an errror “double free or corruption (out)” and then begin to display pages worth of documentation under either “memory map” or “back trace.” Does this help indicate a problem in any way? Thank you for your help.

That is odd. Try with just O3:


With just O3 the sampling is working without that error. But I am not yet able to tell if things have sped up.

To gauge the speed, maybe try using ezStan, which has a progress bar with ETA.

Also, have you considered using cmdstanr? If that’s possible in the HPC environment, it’ll give you the latest/greatest speed/features.

Hi everyone. Thanks for taking the time to respond to this post. I will look into cmdstanr as I move forward with my projects. In the meantime, adding “O3” to the CXX14FLAGS seems to have fixed the slow sampling issue. Obviously it’s probably possible to find a way to speed things up further, but I am at least moving as quickly as I was back in November with this small fix. I am a bit dumbfounded but very happy, so thanks again!!