Speed issues since upgrading to RStan v2.21.2

andrjohns · August 13, 2020, 4:26pm

In the mean-time, could you list the different functions that are called in the model (e.g., log, lgamma, etc)? Then we could look at how they had been changed between 2.19 and 2.21

bbbales2 · August 13, 2020, 6:28pm

The easiest way to check this will be Cmdstan so we can control the versions of stuff (also Rstan doesn’t have as many releases as Cmdstan).

You can dump your data (assuming it is in a list called mydata) in a format cmdstan can use with:

stan_rdump(names(mydata), "mydata.dat", envir = list2env(mydata))

Once you have that, I think there are three versions of cmdstan we should test: 2.20, 2.21 and then 2.21 with the 2.20 version of the samplers.

First get a copy of Stan:

git clone --recursive https://github.com/stan-dev/cmdstan.git

Make three copies of that, cmdstan_20, cmdstan_21, and cmdstan_middle.

Go into cmdstan_20 and do:

git checkout 7d2b7a2
make stan-update

Go into cmdstan_21 and do:

git checkout e8c0967
make stan-update

Go into cmdstan_middle and do:

git checkout e8c0967
make stan-update
cd stan
git checkout b3add46
cd ../

If make doesn’t work I think try mingw32-make (at some point we needed to start using that make in Windows cause make doesn’t work totally right).

Once you have this set up, build and run your model with each of these. Given the performance differences are so large, one chain each should be fine. If you have >3 cores, just run them together.

I assume 2.20 will be fast like rstan 2.19 was. We’ll double check cmdstan 2.21 is slow like rstan 2.21 is, and then we can figure out if it’s the sampler or the math changes between 2.20 and 2.21 that are driving the difference.

willte · August 13, 2020, 6:30pm

Yes the parameters seem to have very similar ESS across the two models. In the new model, however, there are a couple of parameters with uncharacteristically low ESS. These parameters are actually doing nothing, they are not being regressed against any data, I guess they are effectively priors only. It sounds stupid to even have them in but it makes thing a lot more convenient when it comes to for loops and interpretability if they are there.

willte · August 13, 2020, 6:32pm

Yes I am and I will look into this.

bbbales2 · August 13, 2020, 6:33pm

Oo, can you make sure and put priors on them? Like normal(0, 1) s. Just so they don’t go crazy.

I didn’t realize the model changed between 2.19 and 2.21. Do you see the performance difference with the same model between 2.19 and 2.21? We only want to do the performance regression thing if so.

willte · August 13, 2020, 6:33pm

exp()
sum()
normal()

willte · August 13, 2020, 6:35pm

Yup there are priors on them.

Sorry if I didn’t make this clear, but the models are the same between both versions of rstan. It’s just the ‘data-less’ parameters had low ESS in the new model and high ESS in the old model.

willte · August 13, 2020, 6:36pm

bbbales2:

The easiest way to check this will be Cmdstan so we can control the versions of stuff (also Rstan doesn’t have as many releases as Cmdstan).

You can dump your data (assuming it is in a list called mydata ) in a format cmdstan can use with:
stan_rdump(names(mydata), "mydata.dat", envir = list2env(mydata))
Once you have that, I think there are three versions of cmdstan we should test: 2.20, 2.21 and then 2.21 with the 2.20 version of the samplers.

First get a copy of Stan:
git clone --recursive https://github.com/stan-dev/cmdstan.git
Make three copies of that, cmdstan_20 , cmdstan_21 , and cmdstan_middle .

Go into cmdstan_20 and do:
git checkout 7d2b7a2
make stan-update
Go into cmdstan_21 and do:
git checkout e8c0967
make stan-update
Go into cmdstan_middle and do:
git checkout e8c0967
make stan-update
cd stan
git checkout b3add46
cd ../
If make doesn’t work I think try mingw32-make (at some point we needed to start using that make in Windows cause make doesn’t work totally right).

Once you have this set up, build and run your model with each of these. Given the performance differences are so large, one chain each should be fine. If you have >3 cores, just run them together.

I assume 2.20 will be fast like rstan 2.19 was. We’ll double check cmdstan 2.21 is slow like rstan 2.21 is, and then we can figure out if it’s the sampler or the math changes between 2.20 and 2.21 that are driving the difference.

I worry this sounds a bit out of my depth, but sounds like a great learning exercise for me. I will be able to have a go in next day or two.

bbbales2 · August 13, 2020, 6:45pm

Hmm, interesting. Well there was a change to how the U-turns in the sampler were computed between 2.20 and 2.21.

The checks in 2.21 are more aggressive (they include all the 2.20 checks plus some), so I wouldn’t expect this behavior, but that’s basically what we’re testing with the cmdstan experiments.

but sounds like a great learning exercise

Well it can be a bit of a headache lol, but ask if something breaks

willte · August 14, 2020, 7:40am

I’ve tried running the model again, just to watch things more closely, and it actually seems the warmup is now slower in the new model, not the sampling.

wds15 · August 14, 2020, 8:49am

I don’t think that the change to boost warrants a 2x/3x slowdown, but in any case, you can try to add the compile flag

-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false

to the CXX14FLAGS in your Makevars in order to lower the precision of lgamma from boost (the higher precision is not needed to my knowledge). This will speedup lgamma by some margin.

EDIT: Though it could be that RStan already slaps this on… don’t know for sure.

willte · August 29, 2020, 2:10pm

Hi Ben, the first git checkout worked but the second one I got:

error: pathspec ‘e8c0967’ did not match any file(s) known to git

Have I done something wrong?

bbbales2 · August 29, 2020, 2:15pm

Let me double check. I looked it up and cmdstan 2.21 is ee2a3b4, so I’m not sure why I put e8c0967 there especially if it doesn’t exist lol.

bbbales2 · August 29, 2020, 2:28pm

Had a look!

e8c0967 is the 2.21 version of the samplers (stan-dev/stan). I screwed up putting it there (meant to put the tag for 2.21 cmdstan [which is different from the stan-dev/stan repo]).

Your new directions are:

Go into cmdstan_20 and do:

git checkout 7d2b7a2
make stan-update

Go into cmdstan_21 and do:

git checkout ee2a3b4
make stan-update

Go into cmdstan_middle and do:

git checkout ee2a3b4
make stan-update
cd stan
git checkout b3add46
cd ../

willte · August 29, 2020, 2:57pm

Thanks after running that I got the following message, hope that’s as expected:

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

HEAD is now at ee2a3b4 release/v2.21.0: updating version numbers
M stan

bbbales2 · August 29, 2020, 3:00pm

Yeah that looks fine. Make sure and do the make stan-update afterwards (if doing the 2.21 version) or make stan-update && cd stan && git checkout b3add46 && cd ../ (if doing the last one)

willte · August 30, 2020, 8:52am

Thanks Ben, I’ve now run all of them and no noticeable speed differences between the versions. Sigh. I will have to keep digging to see what might be causing the models take so much longer.

Just to make sure I’ve done the installations right, is there a way I can check what versions of stan I have installed in each directory? For example running a command like:

stan --version

?

If it proves I’ve installed the versions correctly, is there an even earlier version of Stan worth trying? I’m now worried I hadn’t updated Stan on my computer for 18-24 months.

rok_cesnovar · August 30, 2020, 8:57am

You can check the version in the CSV that is produced as the result of sampling.

If you didnt specify a file its usually output.csv

The CSV files start with

# stan_version_major = 2
# stan_version_minor = 24
# stan_version_patch = 1

willte · August 30, 2020, 10:57am

stan_version_major = 2
stan_version_minor = 24
stan_version_patch = 1

Thanks @rok_cesnovar,

For cmdstan_20 I have 2.20.0.

For cmdstan_21 I have 2.21.0.

For cmdstan_middle I have 2.20.0.

Seems like I missed something for installation for cmdstan_middle? But doesn’t really matter because 20.0 and 21.0 aren’t showing any meaningful difference in speed results?

bbbales2 · August 30, 2020, 3:24pm

And are these all the slow speeds, right? So this mean something between 2.19 and 2.20 I guess.

Can you confirm that cmdstan 2.19 is fast for you?

You can clone a copy of cmdstan and do:

git checkout 77a398e
make stan-update

or download a copy from here: https://github.com/stan-dev/cmdstan/releases/tag/v2.19.1

Topic		Replies	Views
From fast to slow sampling on cluster after reset and older rstan version installed General	8	677	January 27, 2021
Rstan 2.19.2 slower than 2.18.1 Developers rstan	15	1243	August 27, 2019
Rstan on remote servers General	9	1962	December 14, 2020
RStan: long compiling time for a simple practice model Interfaces rstan	4	78	May 16, 2025
Nonlinear increase in step speed with more rows of data -- possible bug? General	7	539	November 5, 2018

Speed issues since upgrading to RStan v2.21.2

Related topics