I thought @bbbales2 is on Linux such that it’s apparently not needed there. The issue is that compilation times go up somewhat is what someone mentioned. I can time a compilation with and without the switch on my macOS. In case the compilation times are not too far off, then we can just turn it on for macOS as a default would be my suggestion.
@wds15 I did more benching.
for i in {1..100}
do
./blrm241 sample num_warmup=500 num_samples=500 data file=blrm2.data.R output file=241.$i.csv
./blrm24 sample num_warmup=500 num_samples=500 data file=blrm2.data.R output file=24.$i.csv
./blrm33 sample num_warmup=500 num_samples=500 data file=blrm2.data.R output file=33.$i.csv
./blrm34 sample num_warmup=500 num_samples=500 data file=blrm2.data.R output file=34.$i.csv
done
33 is 2.25 with Math 3.3, 34 is 2.25 with Math 3.4, 24 is 2.24.0, and 241 is Cmdstan commit #9396d19 (v2.24.1 and v2.24 are pointing at the same commit for some reason so I picked the first working build after Aug22 for my v2.24.1 (Cmdstan 2.24.1 is released - #9 by rok_cesnovar)).
Results were:
[1] "241"
20% 80%
200 220
[1] "24"
20% 80%
198 219
[1] "33"
20% 80%
198 216
[1] "34"
20% 80%
201 222
I don’t even know how to interpret this now. The timings I gave earlier of 33 and 34 were the same binaries and the same computer and those were substantially faster than these.
library(rstan)
for(version in c("241", "24", "33", "34")) {
timings = c()
for(i in 1:100) {
fit = read_stan_csv(paste0(version, ".", i, ".csv"))
timings = c(timings, get_elapsed_time(fit)[1, "sample"] / sum(get_num_leapfrog_per_iteration(fit)))
}
print(version)
print(round(1000000 * quantile(timings, c(0.20, 0.80)), 0))
}
Then maybe this is a platform thing?
I don’t know what it is given I’m not reproducing the benches I got from yesterday. I checked the md5sums and I’m not running the same binary at least lol (edit: I mean the same binary across these four tests).
I don’t trust myself or my benchmarks :/. I haven’t reproduced your perf stuff, but I also didn’t manage to reproduce my perf results from yesterday morning. Ugh.
2.24.0 and 2.24.1 only differ in stanc3. No changes in Math/Stan/Cmdstan.
But cmdstan 2.24.0 and 2.24.1 still have to be different cause one needs to somehow point at a different stanc3 than the other, which means there is a difference somewhere?
Right now they are the exact same commit (fa449c5), just one is marked July and one August here: Tags · stan-dev/cmdstan · GitHub.
We do not point to stanc3, the release tarballs carry the respective binaries.
Oooh, okay, so to get the right stanc3 I need to be downloading the tarballs. Got it
I ran this on a different computer with 18.04 with clang++ 6.0 and got:
[1] "33"
20% 80%
121 124
[1] "34"
20% 80%
123 125
Maybe this is a platform thing. (edit: like you said)
Edit: And with 2.24.1 in place with the correct stanc3 I got:
[1] "241"
20% 80%
122 123
It looks like compile time almost double on my macOS. Here is a grep of the log files I created:
build-no-optims-2.log:real 1m1.153s
build-no-optims-2.log:real 0m9.486s
build-no-optims-2.log:real 0m7.715s
build-no-optims.log:real 1m0.486s
build-no-optims.log:real 0m9.233s
build-no-optims.log:real 0m7.636s
build-optims-2.log:real 0m58.215s
build-optims-2.log:real 0m21.766s
build-optims-2.log:real 0m20.662s
build-optims.log:real 0m55.267s
build-optims.log:real 0m22.173s
build-optims.log:real 0m20.346s
The first time is the build of cmdstan, then the first build time o the Bernoulli example which is then deleted and rebuild again. This has been repeated once per case and you see ~20s compile time with optimisations vs 10s without. So we should probably not turn it on by default, but document this well?
Thanks for the timings!
Yeah, in that case turning this on by default is not the best option. How do we best document this? cmdstan manual + release notes?
I still don’t like the name STAN_COMPILER_OPTIMS as it confuses me with the stanc optimizations as opposed to C++ compiler optimizations, but that is probably just me being too nitpicky…
Hi!
Yes, we should document this flag in the user manual and the release notes. Looking at the name, I agree in that STAN_CPP_OPTIMS could have been a better name for the flag.
Who can handle the doc?
Sebastian
are you going to change the name to STAN_CPP_OPTIMS?
I can help with the doc.
I can handle the cmdstan PR later today.
Docs PR finally ready: https://github.com/stan-dev/docs/pull/279
I kept it simple but can go into more details on the flags if needed.
@ariddell a heads up that the release will happen sometime tomorrow. Unfortunately, due to the bugs discovered during the freeze, we were not able to give you any more advance notice this time.