Speed issues since upgrading to RStan v2.21.2

In the mean-time, could you list the different functions that are called in the model (e.g., log, lgamma, etc)? Then we could look at how they had been changed between 2.19 and 2.21

The easiest way to check this will be Cmdstan so we can control the versions of stuff (also Rstan doesn’t have as many releases as Cmdstan).

You can dump your data (assuming it is in a list called mydata) in a format cmdstan can use with:

stan_rdump(names(mydata), "mydata.dat", envir = list2env(mydata))

Once you have that, I think there are three versions of cmdstan we should test: 2.20, 2.21 and then 2.21 with the 2.20 version of the samplers.

First get a copy of Stan:

git clone --recursive https://github.com/stan-dev/cmdstan.git

Make three copies of that, cmdstan_20, cmdstan_21, and cmdstan_middle.

Go into cmdstan_20 and do:

git checkout 7d2b7a2
make stan-update

Go into cmdstan_21 and do:

git checkout e8c0967
make stan-update

Go into cmdstan_middle and do:

git checkout e8c0967
make stan-update
cd stan
git checkout b3add46
cd ../

If make doesn’t work I think try mingw32-make (at some point we needed to start using that make in Windows cause make doesn’t work totally right).

Once you have this set up, build and run your model with each of these. Given the performance differences are so large, one chain each should be fine. If you have >3 cores, just run them together.

I assume 2.20 will be fast like rstan 2.19 was. We’ll double check cmdstan 2.21 is slow like rstan 2.21 is, and then we can figure out if it’s the sampler or the math changes between 2.20 and 2.21 that are driving the difference.

1 Like

Yes the parameters seem to have very similar ESS across the two models. In the new model, however, there are a couple of parameters with uncharacteristically low ESS. These parameters are actually doing nothing, they are not being regressed against any data, I guess they are effectively priors only. It sounds stupid to even have them in but it makes thing a lot more convenient when it comes to for loops and interpretability if they are there.

Yes I am and I will look into this.

Oo, can you make sure and put priors on them? Like normal(0, 1) s. Just so they don’t go crazy.

I didn’t realize the model changed between 2.19 and 2.21. Do you see the performance difference with the same model between 2.19 and 2.21? We only want to do the performance regression thing if so.

exp()
sum()
normal()

Yup there are priors on them.

Sorry if I didn’t make this clear, but the models are the same between both versions of rstan. It’s just the ‘data-less’ parameters had low ESS in the new model and high ESS in the old model.

I worry this sounds a bit out of my depth, but sounds like a great learning exercise for me. I will be able to have a go in next day or two.

Hmm, interesting. Well there was a change to how the U-turns in the sampler were computed between 2.20 and 2.21.

The checks in 2.21 are more aggressive (they include all the 2.20 checks plus some), so I wouldn’t expect this behavior, but that’s basically what we’re testing with the cmdstan experiments.

but sounds like a great learning exercise

Well it can be a bit of a headache lol, but ask if something breaks

I’ve tried running the model again, just to watch things more closely, and it actually seems the warmup is now slower in the new model, not the sampling.

1 Like

I don’t think that the change to boost warrants a 2x/3x slowdown, but in any case, you can try to add the compile flag

-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false

to the CXX14FLAGS in your Makevars in order to lower the precision of lgamma from boost (the higher precision is not needed to my knowledge). This will speedup lgamma by some margin.

EDIT: Though it could be that RStan already slaps this on… don’t know for sure.

2 Likes

Hi Ben, the first git checkout worked but the second one I got:

error: pathspec ‘e8c0967’ did not match any file(s) known to git

Have I done something wrong?

Let me double check. I looked it up and cmdstan 2.21 is ee2a3b4, so I’m not sure why I put e8c0967 there especially if it doesn’t exist lol.

1 Like

Had a look!

e8c0967 is the 2.21 version of the samplers (stan-dev/stan). I screwed up putting it there (meant to put the tag for 2.21 cmdstan [which is different from the stan-dev/stan repo]).

Your new directions are:

Go into cmdstan_20 and do:

git checkout 7d2b7a2
make stan-update

Go into cmdstan_21 and do:

git checkout ee2a3b4
make stan-update

Go into cmdstan_middle and do:

git checkout ee2a3b4
make stan-update
cd stan
git checkout b3add46
cd ../
1 Like

Thanks after running that I got the following message, hope that’s as expected:

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

HEAD is now at ee2a3b4 release/v2.21.0: updating version numbers
M stan

Yeah that looks fine. Make sure and do the make stan-update afterwards (if doing the 2.21 version) or make stan-update && cd stan && git checkout b3add46 && cd ../ (if doing the last one)

1 Like

Thanks Ben, I’ve now run all of them and no noticeable speed differences between the versions. Sigh. I will have to keep digging to see what might be causing the models take so much longer.

Just to make sure I’ve done the installations right, is there a way I can check what versions of stan I have installed in each directory? For example running a command like:

stan --version

?

If it proves I’ve installed the versions correctly, is there an even earlier version of Stan worth trying? I’m now worried I hadn’t updated Stan on my computer for 18-24 months.

You can check the version in the CSV that is produced as the result of sampling.

If you didnt specify a file its usually output.csv

The CSV files start with

# stan_version_major = 2
# stan_version_minor = 24
# stan_version_patch = 1
1 Like

stan_version_major = 2
stan_version_minor = 24
stan_version_patch = 1

Thanks @rok_cesnovar,

For cmdstan_20 I have 2.20.0.

For cmdstan_21 I have 2.21.0.

For cmdstan_middle I have 2.20.0.

Seems like I missed something for installation for cmdstan_middle? But doesn’t really matter because 20.0 and 21.0 aren’t showing any meaningful difference in speed results?

And are these all the slow speeds, right? So this mean something between 2.19 and 2.20 I guess.

Can you confirm that cmdstan 2.19 is fast for you?

You can clone a copy of cmdstan and do:

git checkout 77a398e
make stan-update

or download a copy from here: https://github.com/stan-dev/cmdstan/releases/tag/v2.19.1

1 Like