CmdStan & Stan 2.33 release candidate

Very cool stuff, especially being able to use tuples! Thank you for sharing the release candidate.

I’ve tried 2.33-rc1 on Ubuntu 22.04.3 in a GitHub workflow which compiles about 30 models or so. I noticed an increase in compile time (12m 57s) compared with 2.32.2 (8m 10s). Is that to be expected? The full runtime configuration can be found here.

The time to draw samples via cmdstanpy also increased slightly (4m 21s compared with 3m 33s). But take that with a big pinch of salt because I’m only running very short chains in CI, the tests aren’t seeded, and I’m also testing a few python functions that may add runtime variability.

The repo isn’t public at the moment, unfortunately. Let me know if this is not reproducible elsewhere, and I can set up a mock repo.

Thanks again for the great software!

4 Likes

Hmm, compile times might have increased, though that seems a bit of an excessive 30%+ increase.

I would be surprised by a runtime increase, as we have some tests running that would have spotted that. But we should investigate both of these.

Thanks for reporting!

Have you noticed the same locally? GitHub’s CI machines can be inconsistent in terms of hardware and load

This isn’t terribly rigorous, but I tried one model (schools-4). These are all with freshly downloaded and built cmdstan instances, no make/local or other customization, Ubuntu 22.04, gcc 11.4.0

Compiling (2.31):

	Command being timed: "make schools-4"
	User time (seconds): 21.55
	System time (seconds): 1.04
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.60

Compiling (2.32.2):

	Command being timed: "make schools-4"
	User time (seconds): 28.74
	System time (seconds): 1.21
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.97

Compiling (2.33-rc1):

	Command being timed: "make schools-4"
	User time (seconds): 36.02
	System time (seconds): 1.48
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.51

Sampling (2.31):

 Elapsed Time: 1.123 seconds (Warm-up)
               87.345 seconds (Sampling)
               88.468 seconds (Total)

Sampling (2.32.2):

 Elapsed Time: 1.104 seconds (Warm-up)
               86.014 seconds (Sampling)
               87.118 seconds (Total)

Sampling (2.33-rc1)

 Elapsed Time: 1.097 seconds (Warm-up)
               88.243 seconds (Sampling)
               89.34 seconds (Total)

I had enough other processes running on my machine that I’m not too worried about the 1s differences in the runtimes. It does seem like compiling may have taken a hit in each of the last couple versions - I suspect the difference between 31 and 32 could be due to Eigen 3.4, but I’m not sure what the difference would be for this version.

The stanc-generated code for this model did not change very much altogether. For the model I was using, it was even possible to copy the .hpp file from 2.32 to the 2.33rc1 build, and that was only about 1 second faster, so it seems like the change is primarily in the Stan C++ codebase

2 Likes

Interestingly I get faster compile times locally on macOS 13.4 (22F66) with M1 chip and Apple clang version 14.0.3 (clang-1403.0.22.14.1): 5m 7s with 2.33-rc1 and 5m 11s with 2.32.2.

I’ve also rerun this in a docker container (debian GNU/Linux 12 (bookworm)) with g++ (Debian 12.2.0-14) 12.2.0: 11m 9s for 2.33.0-rc1 and 7m 50s for 2.32.2. Maybe this is a g++ issue?

Edit: tried this again on GitHub with clang (8m 42s for 2.32.2 and 12m 4s for 2.33.0-rc1). So maybe not a g++ issue after all.

1 Like

Have you been able to re-create the changes in runtime performance?

Sampling times seem to be unchanged; the previous observation seems to have been a fluke. See below for 20 models and different combinations of compilers and cmdstan versions (all in seconds of wall time).

Each dot is a different model. First line is scatter of different compilers for a given cmdstan version. Second line is scatter of cmdstan versions for a given compiler. Nothing interesting to see here in terms of sampling times.

Clang is faster in terms of compile time irrespective of version. For clang, compile time doesn’t change with different cmdstan versions. With gcc, some models can be compiled faster with 2.33.0-rc1 than with 2.32.2. Those are different versions of the vote share forecasting problem from chapter 15 of BDA3; no idea why.

Full details below.

Timings
compile_time,sample_time,compiler,cmdstan,model
37.56741166114807,3.646681547164917,gcc,2.32.2,bda3:chapter_15:presidential-hierarchical-nc
42.988043785095215,1.5719959735870361,gcc,2.32.2,bda3:chapter_15:presidential-olr
49.55060577392578,4.725171089172363,gcc,2.32.2,bda3:chapter_15:presidential-hierarchical
11.787280797958374,0.03844404220581055,gcc,2.32.2,bda3:chapter_15:schools-nc
10.62388300895691,0.0704963207244873,gcc,2.32.2,bda3:chapter_15:schools
12.169351577758789,0.37670350074768066,gcc,2.32.2,rethinking:chapter_13:m13-6
9.188542127609253,0.07953214645385742,gcc,2.32.2,rethinking:chapter_13:m13-1
9.789063930511475,0.0906522274017334,gcc,2.32.2,rethinking:chapter_13:m13-2
10.428519010543823,0.18976879119873047,gcc,2.32.2,rethinking:chapter_13:m13-5
10.323498725891113,0.2903604507446289,gcc,2.32.2,rethinking:chapter_13:m13-4
10.986624717712402,0.5804274082183838,gcc,2.32.2,rethinking:chapter_13:m13-4nc
12.12843132019043,12.726063966751099,gcc,2.32.2,rethinking:chapter_12:m12-5
12.40358853340149,4.914919137954712,gcc,2.32.2,rethinking:chapter_12:m12-4
9.38602900505066,5.781336784362793,gcc,2.32.2,rethinking:chapter_11:m11-4
11.977131128311157,4.349946975708008,gcc,2.32.2,arm:chapter_14:m1
20.611008882522583,148.68280458450317,gcc,2.32.2,arm:chapter_14:m2
15.607534170150757,102.03421974182129,gcc,2.32.2,arm:chapter_14:m2-nc-no-dc
16.022870779037476,91.20781755447388,gcc,2.32.2,arm:chapter_14:m2-nc
9.525382041931152,0.12505245208740234,gcc,2.32.2,arm:chapter_22:pilots-1-nc
9.983759880065918,0.16334223747253418,gcc,2.32.2,arm:chapter_22:pilots-1
20.32929301261902,3.46524977684021,gcc,2.33.0-rc1,bda3:chapter_15:presidential-hierarchical-nc
24.26268172264099,1.151839017868042,gcc,2.33.0-rc1,bda3:chapter_15:presidential-olr
19.143707990646362,3.0011560916900635,gcc,2.33.0-rc1,bda3:chapter_15:presidential-hierarchical
14.714248418807983,0.08894467353820801,gcc,2.33.0-rc1,bda3:chapter_15:schools-nc
14.492254495620728,0.06848978996276855,gcc,2.33.0-rc1,bda3:chapter_15:schools
17.273163318634033,0.337432861328125,gcc,2.33.0-rc1,rethinking:chapter_13:m13-6
17.005021333694458,0.08601498603820801,gcc,2.33.0-rc1,rethinking:chapter_13:m13-1
16.56342101097107,0.0862576961517334,gcc,2.33.0-rc1,rethinking:chapter_13:m13-2
17.494102954864502,0.16826128959655762,gcc,2.33.0-rc1,rethinking:chapter_13:m13-5
15.638691186904907,0.2662944793701172,gcc,2.33.0-rc1,rethinking:chapter_13:m13-4
19.145731925964355,0.6046392917633057,gcc,2.33.0-rc1,rethinking:chapter_13:m13-4nc
15.927358627319336,11.840963363647461,gcc,2.33.0-rc1,rethinking:chapter_12:m12-5
16.341104984283447,4.811244964599609,gcc,2.33.0-rc1,rethinking:chapter_12:m12-4
16.518432140350342,5.789674997329712,gcc,2.33.0-rc1,rethinking:chapter_11:m11-4
16.186206579208374,4.184360504150391,gcc,2.33.0-rc1,arm:chapter_14:m1
20.196232318878174,155.65951251983643,gcc,2.33.0-rc1,arm:chapter_14:m2
21.241365909576416,102.65987610816956,gcc,2.33.0-rc1,arm:chapter_14:m2-nc-no-dc
23.941023111343384,94.52347326278687,gcc,2.33.0-rc1,arm:chapter_14:m2-nc
15.884380102157593,0.14458346366882324,gcc,2.33.0-rc1,arm:chapter_22:pilots-1-nc
18.080141305923462,0.09834933280944824,gcc,2.33.0-rc1,arm:chapter_22:pilots-1
9.225769519805908,3.482273578643799,clang,2.32.2,bda3:chapter_15:presidential-hierarchical-nc
9.97201919555664,1.2159242630004883,clang,2.32.2,bda3:chapter_15:presidential-olr
9.413137197494507,6.243260145187378,clang,2.32.2,bda3:chapter_15:presidential-hierarchical
6.255299091339111,0.06447005271911621,clang,2.32.2,bda3:chapter_15:schools-nc
5.769744396209717,0.10057592391967773,clang,2.32.2,bda3:chapter_15:schools
7.673097372055054,0.31163716316223145,clang,2.32.2,rethinking:chapter_13:m13-6
5.708240509033203,0.049269914627075195,clang,2.32.2,rethinking:chapter_13:m13-1
6.316137075424194,0.07291889190673828,clang,2.32.2,rethinking:chapter_13:m13-2
7.043802499771118,0.14704036712646484,clang,2.32.2,rethinking:chapter_13:m13-5
7.7119433879852295,0.24457502365112305,clang,2.32.2,rethinking:chapter_13:m13-4
7.486074924468994,0.5741524696350098,clang,2.32.2,rethinking:chapter_13:m13-4nc
7.661090612411499,12.409428358078003,clang,2.32.2,rethinking:chapter_12:m12-5
6.55411171913147,5.183542966842651,clang,2.32.2,rethinking:chapter_12:m12-4
7.109067440032959,6.124364137649536,clang,2.32.2,rethinking:chapter_11:m11-4
7.372790575027466,4.4997618198394775,clang,2.32.2,arm:chapter_14:m1
10.625385522842407,226.80027079582214,clang,2.32.2,arm:chapter_14:m2
10.511707305908203,106.15542674064636,clang,2.32.2,arm:chapter_14:m2-nc-no-dc
11.766937971115112,93.08849596977234,clang,2.32.2,arm:chapter_14:m2-nc
7.643540620803833,0.17764043807983398,clang,2.32.2,arm:chapter_22:pilots-1-nc
7.539230823516846,0.10232782363891602,clang,2.32.2,arm:chapter_22:pilots-1
9.739705562591553,3.6022238731384277,clang,2.33.0-rc1,bda3:chapter_15:presidential-hierarchical-nc
10.89063835144043,1.2008984088897705,clang,2.33.0-rc1,bda3:chapter_15:presidential-olr
9.31385850906372,6.400953054428101,clang,2.33.0-rc1,bda3:chapter_15:presidential-hierarchical
6.204479694366455,0.05354762077331543,clang,2.33.0-rc1,bda3:chapter_15:schools-nc
6.033683776855469,0.08637404441833496,clang,2.33.0-rc1,bda3:chapter_15:schools
8.769347906112671,0.3015925884246826,clang,2.33.0-rc1,rethinking:chapter_13:m13-6
6.163729667663574,0.055722713470458984,clang,2.33.0-rc1,rethinking:chapter_13:m13-1
6.447941541671753,0.10070013999938965,clang,2.33.0-rc1,rethinking:chapter_13:m13-2
7.830121040344238,0.16068267822265625,clang,2.33.0-rc1,rethinking:chapter_13:m13-5
7.686686754226685,0.3231377601623535,clang,2.33.0-rc1,rethinking:chapter_13:m13-4
7.719829082489014,0.5520870685577393,clang,2.33.0-rc1,rethinking:chapter_13:m13-4nc
8.257025241851807,12.892497777938843,clang,2.33.0-rc1,rethinking:chapter_12:m12-5
7.242663145065308,5.2688515186309814,clang,2.33.0-rc1,rethinking:chapter_12:m12-4
6.931958913803101,6.388111114501953,clang,2.33.0-rc1,rethinking:chapter_11:m11-4
7.124103784561157,4.391737699508667,clang,2.33.0-rc1,arm:chapter_14:m1
11.15772294998169,228.88514065742493,clang,2.33.0-rc1,arm:chapter_14:m2
10.772235870361328,107.44897508621216,clang,2.33.0-rc1,arm:chapter_14:m2-nc-no-dc
10.580979108810425,93.53468704223633,clang,2.33.0-rc1,arm:chapter_14:m2-nc
7.014771223068237,0.10702037811279297,clang,2.33.0-rc1,arm:chapter_22:pilots-1-nc
7.342519283294678,0.10162734985351562,clang,2.33.0-rc1,arm:chapter_22:pilots-1
Setup

Base system is a 2020 MacBook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). All experiments were run with Docker Desktop 4.22.1 (118664) and Docker Engine 24.0.5. Base image was python:3.10 with cmdstanpy==1.1.0. I switched between compilers using CC={clang,gcc} and CXX={clang++,g++} environment variables. Some more details:

$ uname -a
Linux 74e9318d9e12 5.15.49-linuxkit-pr #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 aarch64 GNU/Linux

$ gcc --version
gcc (Debian 12.2.0-14) 12.2.0

$ g++ --version
g++ (Debian 12.2.0-14) 12.2.0

$ clang --version
Debian clang version 14.0.6
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang++ --version
Debian clang version 14.0.6
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

For each of 20 models, I ran two chains with 1000 warmup samples, 500 post-warmup samples, and didn’t check any of the diagnostics like divergences, E-BFMI, R_hat etc. All times are wall times.

5 Likes

Thanks for the thorough look in.

If nothing else comes up I think we’re still a go on the release next week, but we should take a look at compile times in the post-release period. I think the Pathfinder implementation was pretty heavily templated and that may be to blame

3 Likes