Stan 2.20.0 released!

Stan 2.20.0 is out, thanks to everyone who contributed!

There are a lot of exciting changes that I’ll save for the release notes (Math, Stan, CmdStan), but one thing I want to call out is a huge usability improvement in 2.20.0 - @Bob_Carpenter figured out a way to factor out much of the Stan algorithms and service code and pre-compile it, meaning each Stan model now compiles something like 5x faster (on Bob’s machine I believe it was 35s down to 7s). This should be a huge improvement in the feedback loop for iterative model development and testing speed and I’m really excited about it.

@wds15 also lead a heroic effort starting in January of this year to revamp how our autodiff handles threading and going through 6 different iterations (with lots of help from @rok_cesnovar and thorough reviews by @syclik) that leads to a very nice speedup when using map_rect with threading, as well as the ability to finally using map_rect with threading on Windows machines.

[edit] I have also discovered a bug in Linux CmdStan for some models using cvodes., but a fix should be available in develop shortly.

Here’s a list of everyone who contributed something via git to this release in CmdStan, Stan, or Math (git log --pretty=format:"%an" v2.19.1..v2.20.0 | sort | uniq):
Vlad Fatenko


So cool to see this out!

There is one thing which I wanted to add as it did not make it into the release notes of math (probably due to the many merges/unmerges/remerges): When turning on threading in 2.20.0 then you won’t anymore see significant performance degradations on Linux and macOS as compared to the non-threading models. In the past you had a ~20% performance penalty on average. In addition, we finally fixed a very nasty bug on Windows where map_rect did not work with the gcc 4.9.3 compiler which we are using there. So map_rect based threading now works just fine on Windows with the gcc 4.9.3 compiler (it seems to be slow, though). It was a major thing to get into Stan-math due to the need to run large scale performance tests, so thanks for everyone making this possible.


Ah yeah! How slow is threading on Windows? If it’s still worthwhile I can lift that part up into the top announcement post too :)

I think there are different threading implementations on Windows available. So it is a question of the compiler. For the gcc 4.9.3 on Windows we got in the performance benchmarks for the warfarin example these runtimes:

  • no STAN_THREADS, single core: 318.86s


  • 1 core: 452.852s
  • 2 cores: 224.1456s
  • 4 cores: 151.7429s

So you still gain with more cores, it’s just that turning on STAN_THREADS on Windows hurts (and it doesn’t hurt on macOS or Linux).

For all the details we have a huge table in the PR:

1 Like