In the mean-time, could you list the different functions that are called in the model (e.g., log
, lgamma
, etc)? Then we could look at how they had been changed between 2.19 and 2.21
The easiest way to check this will be Cmdstan so we can control the versions of stuff (also Rstan doesnât have as many releases as Cmdstan).
You can dump your data (assuming it is in a list called mydata
) in a format cmdstan can use with:
stan_rdump(names(mydata), "mydata.dat", envir = list2env(mydata))
Once you have that, I think there are three versions of cmdstan we should test: 2.20, 2.21 and then 2.21 with the 2.20 version of the samplers.
First get a copy of Stan:
git clone --recursive https://github.com/stan-dev/cmdstan.git
Make three copies of that, cmdstan_20
, cmdstan_21
, and cmdstan_middle
.
Go into cmdstan_20
and do:
git checkout 7d2b7a2
make stan-update
Go into cmdstan_21
and do:
git checkout e8c0967
make stan-update
Go into cmdstan_middle
and do:
git checkout e8c0967
make stan-update
cd stan
git checkout b3add46
cd ../
If make
doesnât work I think try mingw32-make
(at some point we needed to start using that make in Windows cause make
doesnât work totally right).
Once you have this set up, build and run your model with each of these. Given the performance differences are so large, one chain each should be fine. If you have >3 cores, just run them together.
I assume 2.20 will be fast like rstan 2.19 was. Weâll double check cmdstan 2.21 is slow like rstan 2.21 is, and then we can figure out if itâs the sampler or the math changes between 2.20 and 2.21 that are driving the difference.
Yes the parameters seem to have very similar ESS across the two models. In the new model, however, there are a couple of parameters with uncharacteristically low ESS. These parameters are actually doing nothing, they are not being regressed against any data, I guess they are effectively priors only. It sounds stupid to even have them in but it makes thing a lot more convenient when it comes to for loops and interpretability if they are there.
Yes I am and I will look into this.
Oo, can you make sure and put priors on them? Like normal(0, 1)
s. Just so they donât go crazy.
I didnât realize the model changed between 2.19 and 2.21. Do you see the performance difference with the same model between 2.19 and 2.21? We only want to do the performance regression thing if so.
exp()
sum()
normal()
Yup there are priors on them.
Sorry if I didnât make this clear, but the models are the same between both versions of rstan. Itâs just the âdata-lessâ parameters had low ESS in the new model and high ESS in the old model.
I worry this sounds a bit out of my depth, but sounds like a great learning exercise for me. I will be able to have a go in next day or two.
Hmm, interesting. Well there was a change to how the U-turns in the sampler were computed between 2.20 and 2.21.
The checks in 2.21 are more aggressive (they include all the 2.20 checks plus some), so I wouldnât expect this behavior, but thatâs basically what weâre testing with the cmdstan experiments.
but sounds like a great learning exercise
Well it can be a bit of a headache lol, but ask if something breaks
Iâve tried running the model again, just to watch things more closely, and it actually seems the warmup is now slower in the new model, not the sampling.
I donât think that the change to boost warrants a 2x/3x slowdown, but in any case, you can try to add the compile flag
-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false
to the CXX14FLAGS
in your Makevars in order to lower the precision of lgamma from boost (the higher precision is not needed to my knowledge). This will speedup lgamma by some margin.
EDIT: Though it could be that RStan already slaps this on⌠donât know for sure.
Hi Ben, the first git checkout worked but the second one I got:
error: pathspec âe8c0967â did not match any file(s) known to git
Have I done something wrong?
Let me double check. I looked it up and cmdstan 2.21 is ee2a3b4, so Iâm not sure why I put e8c0967 there especially if it doesnât exist lol.
Had a look!
e8c0967
is the 2.21 version of the samplers (stan-dev/stan). I screwed up putting it there (meant to put the tag for 2.21 cmdstan [which is different from the stan-dev/stan repo]).
Your new directions are:
Go into cmdstan_20
and do:
git checkout 7d2b7a2
make stan-update
Go into cmdstan_21
and do:
git checkout ee2a3b4
make stan-update
Go into cmdstan_middle
and do:
git checkout ee2a3b4
make stan-update
cd stan
git checkout b3add46
cd ../
Thanks after running that I got the following message, hope thatâs as expected:
You are in âdetached HEADâ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:git checkout -b
HEAD is now at ee2a3b4 release/v2.21.0: updating version numbers
M stan
Yeah that looks fine. Make sure and do the make stan-update
afterwards (if doing the 2.21 version) or make stan-update && cd stan && git checkout b3add46 && cd ../
(if doing the last one)
Thanks Ben, Iâve now run all of them and no noticeable speed differences between the versions. Sigh. I will have to keep digging to see what might be causing the models take so much longer.
Just to make sure Iâve done the installations right, is there a way I can check what versions of stan I have installed in each directory? For example running a command like:
stan --version
?
If it proves Iâve installed the versions correctly, is there an even earlier version of Stan worth trying? Iâm now worried I hadnât updated Stan on my computer for 18-24 months.
You can check the version in the CSV that is produced as the result of sampling.
If you didnt specify a file its usually output.csv
The CSV files start with
# stan_version_major = 2
# stan_version_minor = 24
# stan_version_patch = 1
stan_version_major = 2
stan_version_minor = 24
stan_version_patch = 1
Thanks @rok_cesnovar,
For cmdstan_20
I have 2.20.0
.
For cmdstan_21
I have 2.21.0
.
For cmdstan_middle
I have 2.20.0
.
Seems like I missed something for installation for cmdstan_middle
? But doesnât really matter because 20.0
and 21.0
arenât showing any meaningful difference in speed results?
And are these all the slow speeds, right? So this mean something between 2.19 and 2.20 I guess.
Can you confirm that cmdstan 2.19 is fast for you?
You can clone a copy of cmdstan and do:
git checkout 77a398e
make stan-update
or download a copy from here: https://github.com/stan-dev/cmdstan/releases/tag/v2.19.1