Rstan 2.19.2 slower than 2.18.1

JustinYap · July 23, 2019, 6:39am

I have recently noticed that my tests take longer to run and it appears to be due to the latest version of rstan (2.19.2). To verify this, I ran some of our Stan code with versions 2.18.1 and 2.19.2 for 10 different seeds, and observed that the latest version was consistently running around 10% slower than the previous version with 2 chains and 200 iterations. To minimise measurement noise I ran the calculations from safe mode in Windows. My R code is below along with two CSV attachments containing data, which should provide a runnable example. Has anyone else noticed a difference in performance?

data <- list(C = 3L, R = 380L, RS = 3040L, V = 13,
             prior_mean = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             prior_sd = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5))

data$X <- array(c(as.matrix(read.csv("xdata.csv"))), dim = c(3040, 3, 13))
data$Y <- read.csv("ydata.csv")[[1]]
data$S <- rep(8, 380)

options(mc.cores=parallel::detectCores())

library(rstan)
stan.model <- stan_model("hb.stan")
result <- sampling(stan.model, data = data,
                      chains = 2, iter = 200,
                      control = list(max_treedepth = 10,
                                     adapt_delta = 0.8),
                      init = function () list(L_omega = diag(13)))

xdata.csv (328.5 KB) ydata.csv (8.9 KB)hb.stan (1014 Bytes)

JustinYap · July 24, 2019, 11:21pm

I assume this is something that would concern Stan developers and users who care about performance, as the difference in run time could be hours with larger examples?

mitzimorris · July 25, 2019, 7:09pm

I’m sorry to confirm this report using CmdStan 2.18.1, 2.19.1, and 2.20.0

for CmdStan, I generated two Rdump files - hb.data.R and hb.init.R - I hope that the latter corresponds to the RStan init function.

I ran this on my machine a few times for each release - one chain only, same seed, and got consistent results:

time ./hb random seed=123456 data file=hb.data.R init=hb.init.R sample num_samples=200 num_warmup=200 adapt delta=0.8

2.18.1:

 Elapsed Time: 177.458 seconds (Warm-up)
               56.1166 seconds (Sampling)
               233.575 seconds (Total)

2.20.0

 Elapsed Time: 190.415 seconds (Warm-up)
               60.1722 seconds (Sampling)
               250.588 seconds (Total)

I compared the C++ code for the model - all the calls to the math library are the same - attached is an inventory of calls to mathlib in the model.
has anything changes w/r/t these calculations? cholesky factor code? std_normal?
hb-hpp-stan-math-calls.txt (4.1 KB) hb.data.R (461.4 KB) hb.init.R (559 Bytes)

mitzimorris · July 25, 2019, 7:16pm

@seantalts and @syclik - any thoughts on what’s slower?

rok_cesnovar · July 25, 2019, 7:38pm

There were recent changes to the beta function and lgamma was switched from boost:: to std::, the latter one known to cause a performance regression for the threaded case (there is a PR up https://github.com/stan-dev/math/pull/1255). I think it should be between these two.

rok_cesnovar · July 25, 2019, 7:59pm

From my notebook:
on develop:
Elapsed Time: 164.59 seconds (Warm-up)
36.4511 seconds (Sampling)
201.042 seconds (Total)

Stan Math on the bugfix/issue-1250-lgamma branch from @wds15:

Elapsed Time: 140.701 seconds (Warm-up)
38.5668 seconds (Sampling)
179.268 seconds (Total)

~~I am fairly sure that is it~~ Its more of hunch, but someone should double check, my timing results seem quite noisy after I repeated timing a few times. Anyhow that PR should get merged ASAP anyways to at least remove the threading issue on develop, as Sebastian mentioned in the meeting today.

wds15 · July 25, 2019, 8:33pm

I had the same thought and did as well benchmarks which show (I use only 100 warmup + 100 iterations):

2.18.1
 Elapsed Time: 104.734 seconds (Warm-up)
               85.1479 seconds (Sampling)
               189.882 seconds (Total)

develop:
 Elapsed Time: 107.946 seconds (Warm-up)
               94.6883 seconds (Sampling)
               202.634 seconds (Total)

bugfix/issue-1250-lgamma where I just merged in develop:
 Elapsed Time: 93.0542 seconds (Warm-up)
               83.8694 seconds (Sampling)
               176.924 seconds (Total)

The lgamma PR brings a faster log gamma function and also speeds up the digamma function (almost 2x). This model seems to benefit from these changes; so merging the PR sounds great.

JustinYap · July 25, 2019, 11:30pm

Thanks everyone for looking into this! I’m glad that the issue could be replicated and a fix has been found. Could someone provide an indication of when this fix will be released in rstan?

seantalts · July 27, 2019, 12:20pm

The next release is October 18th 2019 and historically RStan trails Stan releases by about 3 months, so I’d guess in about 6 months. To get access to the fix sooner, once that PR is merged you could use CmdStan or CmdStanPy with the develop branch. We’re hoping to develop a CmdStanR version as well that will enable similar functionality on R.

Guido_Biele · July 27, 2019, 4:26pm

It might also be worth to try installing StanHeaders from GitHub, after the bug fix is in math/develop. This has worked previously for me.

maxbiostat · July 30, 2019, 4:03am

Does lchoose internally call lgamma and hence would be affected by this?

Bob_Carpenter · July 30, 2019, 8:03pm

Yes.

lchoose(a, b)
  = lgamma(a + 1) - lgamma(b + 1) - lgamma(a - b + 1);

maxbiostat · July 30, 2019, 8:12pm

@bgoodri do you have pointers on how to get that bug fix for a user running Rstan?

Raghu_Naik · July 31, 2019, 12:22am

Or is there a way to go back to 2.18.1 till the bug is fixed?

JustinYap · August 18, 2019, 10:50pm

@Raghu_Naik, you can use devtools to install older versions from CRAN or GitHub.

maxbiostat · August 27, 2019, 1:18am

Do you have more precise measurements of how much faster the new PR is? I have some code which depends heavily on lchoose and hence am quite excited to get this fix. Fingers crossed the PR gets merged soon! Thanks!

Topic		Replies	Views
Speed difference between rstan and cmdstan for a simple model CmdStan rstan , techniques	25	3364	November 7, 2021
Runtime difference between cmdstan v2.28.2 and v2.27.0 CmdStan	7	958	December 10, 2021
Stan 2.17 running slower on a model than Stan 2.15 General	53	4020	November 7, 2017
Significant Differences between `rstan` and `rstanarm` running times Modeling rstanarm	1	1525	August 8, 2020
From fast to slow sampling on cluster after reset and older rstan version installed General	8	672	January 27, 2021

Rstan 2.19.2 slower than 2.18.1

Related topics