Rstan 2.19.2 slower than 2.18.1

I have recently noticed that my tests take longer to run and it appears to be due to the latest version of rstan (2.19.2). To verify this, I ran some of our Stan code with versions 2.18.1 and 2.19.2 for 10 different seeds, and observed that the latest version was consistently running around 10% slower than the previous version with 2 chains and 200 iterations. To minimise measurement noise I ran the calculations from safe mode in Windows. My R code is below along with two CSV attachments containing data, which should provide a runnable example. Has anyone else noticed a difference in performance?

data <- list(C = 3L, R = 380L, RS = 3040L, V = 13,
             prior_mean = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             prior_sd = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5))

data$X <- array(c(as.matrix(read.csv("xdata.csv"))), dim = c(3040, 3, 13))
data$Y <- read.csv("ydata.csv")[[1]]
data$S <- rep(8, 380)

options(mc.cores=parallel::detectCores())

library(rstan)
stan.model <- stan_model("hb.stan")
result <- sampling(stan.model, data = data,
                      chains = 2, iter = 200,
                      control = list(max_treedepth = 10,
                                     adapt_delta = 0.8),
                      init = function () list(L_omega = diag(13)))

xdata.csv (328.5 KB) ydata.csv (8.9 KB)hb.stan (1014 Bytes)

2 Likes

I assume this is something that would concern Stan developers and users who care about performance, as the difference in run time could be hours with larger examples?

1 Like

I’m sorry to confirm this report using CmdStan 2.18.1, 2.19.1, and 2.20.0

for CmdStan, I generated two Rdump files - hb.data.R and hb.init.R - I hope that the latter corresponds to the RStan init function.

I ran this on my machine a few times for each release - one chain only, same seed, and got consistent results:

time ./hb random seed=123456 data file=hb.data.R init=hb.init.R sample num_samples=200 num_warmup=200 adapt delta=0.8

2.18.1:

 Elapsed Time: 177.458 seconds (Warm-up)
               56.1166 seconds (Sampling)
               233.575 seconds (Total)

2.20.0

 Elapsed Time: 190.415 seconds (Warm-up)
               60.1722 seconds (Sampling)
               250.588 seconds (Total)

I compared the C++ code for the model - all the calls to the math library are the same - attached is an inventory of calls to mathlib in the model.
has anything changes w/r/t these calculations? cholesky factor code? std_normal?
hb-hpp-stan-math-calls.txt (4.1 KB) hb.data.R (461.4 KB) hb.init.R (559 Bytes)

2 Likes

@seantalts and @syclik - any thoughts on what’s slower?

There were recent changes to the beta function and lgamma was switched from boost:: to std::, the latter one known to cause a performance regression for the threaded case (there is a PR up https://github.com/stan-dev/math/pull/1255). I think it should be between these two.

From my notebook:
on develop:
Elapsed Time: 164.59 seconds (Warm-up)
36.4511 seconds (Sampling)
201.042 seconds (Total)

Stan Math on the bugfix/issue-1250-lgamma branch from @wds15:

Elapsed Time: 140.701 seconds (Warm-up)
38.5668 seconds (Sampling)
179.268 seconds (Total)

I am fairly sure that is it Its more of hunch, but someone should double check, my timing results seem quite noisy after I repeated timing a few times. Anyhow that PR should get merged ASAP anyways to at least remove the threading issue on develop, as Sebastian mentioned in the meeting today.

1 Like

I had the same thought and did as well benchmarks which show (I use only 100 warmup + 100 iterations):

2.18.1
 Elapsed Time: 104.734 seconds (Warm-up)
               85.1479 seconds (Sampling)
               189.882 seconds (Total)

develop:
 Elapsed Time: 107.946 seconds (Warm-up)
               94.6883 seconds (Sampling)
               202.634 seconds (Total)

bugfix/issue-1250-lgamma where I just merged in develop:
 Elapsed Time: 93.0542 seconds (Warm-up)
               83.8694 seconds (Sampling)
               176.924 seconds (Total)

The lgamma PR brings a faster log gamma function and also speeds up the digamma function (almost 2x). This model seems to benefit from these changes; so merging the PR sounds great.

3 Likes

Thanks everyone for looking into this! I’m glad that the issue could be replicated and a fix has been found. Could someone provide an indication of when this fix will be released in rstan?

1 Like

The next release is October 18th 2019 and historically RStan trails Stan releases by about 3 months, so I’d guess in about 6 months. To get access to the fix sooner, once that PR is merged you could use CmdStan or CmdStanPy with the develop branch. We’re hoping to develop a CmdStanR version as well that will enable similar functionality on R.

1 Like

It might also be worth to try installing StanHeaders from GitHub, after the bug fix is in math/develop. This has worked previously for me.

5 Likes

Does lchoose internally call lgamma and hence would be affected by this?

Yes.

lchoose(a, b)
  = lgamma(a + 1) - lgamma(b + 1) - lgamma(a - b + 1);

@bgoodri do you have pointers on how to get that bug fix for a user running Rstan?

Or is there a way to go back to 2.18.1 till the bug is fixed?

@Raghu_Naik, you can use devtools to install older versions from CRAN or GitHub.

Do you have more precise measurements of how much faster the new PR is? I have some code which depends heavily on lchoose and hence am quite excited to get this fix. Fingers crossed the PR gets merged soon! Thanks!