Cmdstan 2.23 Release candidate is available!

I am happy to announce that a release candidate for the next release of Cmdstan is now available on Github.

You can find it here: https://github.com/stan-dev/cmdstan/releases/tag/2.23-candidate

Why a release candidate?

We want to do a more wide and thorough test of the next version before the official release planned for next week. To make sure nothing gets missed.

And that is why we need you, the users, to try out the release candidate. Play with new features or just compile and run your models. And let us know what you think.

How to install?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release.

If you are using cmdstanpy, make sure you point to the folder where you extracted the tar.gz file with set_cmdstan_path(os.path.join('path','to','cmdstan'))

With cmdstanr you can install the release candidate using

install_cmdstan(release_url = "https://github.com/stan-dev/cmdstan/releases/download/2.23-candidate/cmdstan-2.23-rc1.tar.gz", cores = 4)

What is new?

The highlight of this release is a new way of parallelizing your models. Other than that, the focus of this release was mostly bugfixes and stability/consistency for edge cases.

A quick rundown of the most notable changes is listed below:

New features:

  • introduces reduce_sum and reduce_sum_static functions that provide a new way of parallelizing a single chain across multiple cores. A tutorial on this new feature can be found here (based on the popular map_rect tutorial by @richard_mcelreath). You can also read the pre-release of the new user guide docs here.
  • added OpenCL (GPU) support for GLMs when using the “~” syntax (previously only supported with target +=)
  • upgraded to Sundials 5.2.0
  • including files snytax supports < > brackets

Notable user-facing bugfixes:

  • Fixed problems with vectorizing neg_binomial_* functions that lead to wrong answers
  • Fixed compiling of user-defined rng functions
  • Improved lbeta to be more numerically stable with one large and one small argument
  • Improved numerical stability with binomial_coefficient_log, neg_binomial_2_lpmf, and neg_binomial_2_log_lpmf computations
  • Fixed problem with wrong gradients for large arguments to log_sum_exp
  • Fixed bug where normal_id_glm did not work with a sigma is not an autodiff type
  • Cleaned up makefile error messages on Windows
  • Better checks for positive-definitnes in mdivide_left_spd
  • More consistent throwing behaviour on QR functions
  • Clearer messages when using a variable name as a keyword
  • Arbitrary spaces allowed betweeen words in “transformed data” and “transformed parameters”

There are also some ongoing refactors and projects that don’t have a direct user-facing impact at the moment like for example:

  • generalizing functions in the Math library
  • adding complex numbers support
  • more OpenCL supported functions via the kernel generator

These currently do not have an effect on your models, but will in the future (with speedups or new features). We want to also make sure those changes in the background didn’t break any of your models.

Thanks,

Stan Development Team

12 Likes

reduce_sum is super cool. Obviously map_rect has been available for a while, but reduce_sum seems more user-friendly. Makes me wonder if this might be the final straw in supporting a change in the typical workstation parallelism approach from sampling chains in parallel across cores to sampling chains serially but with multiple cores for each chain. Previously I only really saw the latter as beneficial for scenarios like models with GPU acceleration and a only single GPU available. But now that I’m thinking about serial chains and within-chain parallelism, maybe another benefit is the ability to check that the first chain is getting reasonable results before bothering with subsequent chains? I know serial chains will break the cool campfire stuff for pooled warmup, but possibly a similar approach where later chains can get info from all earlier chains might be just as good?

2 Likes

Oh, and I wonder if it might make sense to have a version of dot_product that internally uses reduce_sum. Possibly only useful with very large contrast matrices?

I am glad you like reduce sum! Interesting thought that it may be better to give first full resources to a single chain instead of running chains in parallel and combine their Info ala campfire. It will be interesting to evaluate this. Within chain parallelism can be inefficient for some models compared to running multiple chains. However, combining like campfire comes with compromises and only happens every now and then if I recall right. When you run one chain faster with more cores does not need to make any compromise. It will be interesting to test this. My guess is that a combined approach will be best, since doubling the speed of a model with two cores is easy, but with more cores it gets easily harder to have such good scaling.

Running dot product with reduce sum should be possible…but you can already now code up a version, I think. Doing it in Stan math is also possible and likely a bit more efficient.

1 Like

testing latest changes on develop branch for reduce sum, per discussion with @bbbales2 and @wds15 - should we do another RC?

I asked myself the same…so maybe yes?! If it’s not too much burden to do, of course.

Yes, lets play it safe.

1 Like

Release candidate 2 is up - thanks @rok_cesnovar and @serban-nicusor!

RC2 here: https://github.com/stan-dev/cmdstan/releases/tag/v2.23-rc2

4 Likes

Thanks!

The changes in RC2 are:

  • changed orders of argument in reduce_sum (tutorial and users guide have also been updated)
  • reduce_sum now works with user defined _lpdf/_lpmf
  • offset/multiplier and lower/upper can now appear in any order
  • fixed runtime initialization error caused by code generation
  • fixed foreach looping over sliced arrays

Thanks @nhuurre @rybern and @bbbales2 for the fixes!

4 Likes

reduce_sum tutorial is now a Stan case study: https://mc-stan.org/users/documentation/case-studies/reduce_sum_tutorial.html

4 Likes

The following links in the tutorial and the case study seems to be broken 404 not found:

  1. Reduce Sum :https://mc-stan.org/docs/2_23/functions-reference/functions-reduce.html
  2. Picking up grainsize : https://mc-stan.org/docs/2_23/stan-users-guide/reduce-sum.html#reduce-sum-grainsize

I apologise if this is not the correct discussion thread to report a bug. I am a beginner trying to parallelize my code by following the tutorials as my model is stuck while sampling from large dataset.
I would really appreciate if anyone could please point to alternate documentation I can refer

2 Likes

those links won’t work until the 2_23 release is out and we’ve published the 2_23 docs - coming soon!

1 Like

if you’re OK with raw markdown, the source code for the user’s manual is here: https://github.com/stan-dev/docs/blob/master/src/stan-users-guide/parallelization.Rmd
and the functions reference chapter is here: https://github.com/stan-dev/docs/blob/master/src/functions-reference/higher-order_functions.Rmd

1 Like

thanks so much. I am happy to refer to raw markdown