CmdStan 2.27.0 release candidate

I am happy to announce that the latest release candidates of Cmdstan and Stan are now available on Github!

This release cycle brings new ODE solvers, new language/syntax improvements, new functions, performance improvements and bugfixes.

You can find the release candidate for cmdstan here. Instructions for installing are given at the bottom of this post.

Please test the release candidate with your models and report if you experience any problems. We also kindly invite you to test the new features and provide feedback. If you feel some of the new features could be improved or should be changed before the release, please do not hesitate to comment.

The Stan development team appreciates your time and help in making Stan more efficient while maintaining a high level of reliability.

If everything goes according to plan, the 2.27 version will be released next Thursday.

Below are some of the highlights of the new release.

Adjoint ODE

We added a new adjoint ODE solver from CVODES. The adjoint ODE method has favourable performance scaling for large ODE problems with many more parameters than ODE states. As the adjoint ODE approach is numerically more involved, we are exposing the feature for more advanced users such that most tuning parameters of the solver are being exposed. We intend to provide a simplified adjoint ODE solver interface once the community has collected more experience with the solver. For more details for now refer to the design document: design-docs/0027-adjoint-ode.md at master · stan-dev/design-docs · GitHub
More documentation will be available once we officially release.

A new ODE solver - Cash-Karp

We also added another new solver that solves the ODE system using the Cash-Karp algorithm, a 4th/5th order explicit Runge-Kutta method. The Cash-Karp algorithm should improve numerical integration of ODEs with semi-stiffness and/or rapid oscillations.

New array syntax

We silently added support for a new array syntax in the previous release. With this release this new syntax is becoming more prominent and will be used in error messages, documentation and example models. As opposed to the previously used syntax for arrays:

real a[5];
matrix[3,3] b[5];
vector[3] c[5];

we recommend switching to the new equivalent array syntax:

array[5] real a;
array[5] matrix[3,3] b;
array[5] vector[3] c;

The old syntax will be supported for the forseable future but we strongly encourage getting used to the new syntax.

Multiple definitions in one declaration statements

We have added support to allow declaring and defining more than one variable with the same declaration statement, given that they have the same type.
So the following will now be valid in Stan 2.27+:

data {
  real x, y, z;
  array[5] int i, j;
}
transformed data {
  real a = 5, b = 6;
  real c = 5, d, e = 7;
}

New functions and signatures

We added a new function to compute the quantiles in the transformed data or generated quanties. The following signatures are supported:

quantile(data vector, data real) => real
quantile(data row_vector, data real) => real
quantile(data array[] real, data real) => real
quantile(data vector, data array[] real) => array[] real
quantile(data row_vector, data array[] real) => array[] real
quantile(data array[] real, data array[] real) => array[] real

The fma() function now supports more vectorized signatures:

fma(real, real, vector) => vector
fma(real, vector, real) => vector
fma(real, vector, vector) => vector
fma(vector, real, real) => vector
fma(vector, real, vector) => vector
fma(vector, vector, real) => vector
fma(vector, vector, vector) => vector

and for row_vector and matrix arguments instead of vector.

Performance optimizations

The following functions should have improved performance:

  • diag_pre_multiply
  • diag_post_multiply
  • multi_normal_cholesky (when covariance is data)
  • csr_matrix_times_vector (for cases when the matrix is data and the vector is a parameter)

We also improved handling data and transformed data by avoiding an unecessary copy when using reverse mode.

Improved range checks

We have improved the range checks when accesing elements of a vector/row_vector/matrix or array.
These will now return a more informative error in case of a users bug with indexing like setting an index out of a vector’s range.

These checks can in some cases add a noticeable performance regression of around 10%. Users can use the
STAN_NO_RANGE_CHECKS make flag to avoid the range checks in favor of maximum performance.

We advise only turning the checks off once you are sure in the correctness of indexing in your model.

Miscellaneous improvements:

  • More robust detection of improper posteriors.
  • Standalone generated quantities will also run on warmup draws if supplied.
  • The fixed_param method is run automatically if the model has no parameters.
  • The _cdf functions now also use the vertical bar (|) form, matching the lcdf/lcdf/lpdf/lpmf functions.
  • Fixed an issue with a memory leak with the OpenCL backend.
  • The Stan-to-C++ compiler can now output lists of data, parameter and GQ variables in JSON form.
  • Avoiding copying data/transformed data in the stack allocator when doing reverse mode.

Updated downstream dependencies

The TBB library has been upgraded to the 2020.3 version and the Sundials library has been updated to 5.7.0.

How to install?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release. We also have an online Cmdstan guide available at CmdStan User’s Guide

If you are using cmdstanpy, make sure you point to the folder where you extracted the tar.gz file with

set_cmdstan_path(os.path.join('path','to','cmdstan'))

With CmdStanR you can install the release candidate using

cmdstanr::install_cmdstan(version = "2.27.0-rc1", cores = 4)

And then select the RC with

cmdstanr::set_cmdstan_path(file.path(Sys.getenv("HOME"), ".cmdstanr", "cmdstan-2.27.0-rc1"))
18 Likes

Hmm… running my usual performance regression test model finds quite some slowdown:

> fit_26$time()$total
[1] 20.14739
> fit_27$time()$total
[1] 24.8837
> fit_27b$time()$total
[1] 25.45731
> 

26 is the 2.26.1 cmdstan, 27 is 2.27.0-rc1 cmdstan and 27 is 2.27.0-rc1 with STAN_NO_RANGE_CHECKS set.

The usual model & data:

blrm-exnex-26.stan (28.4 KB)
combo3.data.R (3.9 KB)

The above is run with this cmdstanr call:

iter <- 5000

fit_26 <- model_26$sample(
                       data = "blrm_exnex-combo3.data.R",
                       seed = 123,
                       chains = 1,
                       init=0,
                       parallel_chains = 1,
                       iter_warmup=iter,
                       iter_sampling=iter,
                       refresh = 100
                   )
1 Like

Thanks, both versions have profiling so should hopefully be easier to get to the bottom of this time. Will take a look.

Lets continue the discussion here, opened an issue to get to the bottom of this.

This flag would be used at the step in a workflow where me compiles the model exe, right?

Huh, I actually don’t know what this means! Do you mean simply the diagnose method?

2 Likes

It’s referring to PR#3030. Some improper models could run into an infinite loop. (at least in contrived examples. I’m not sure if this ever actually happened to someone.)

4 Likes

Yes, the checks are removed at compile time so the flag must be supplied at compile time, which is in make/local when using CmdStan or cpp_options when using cmdstan wrappers.

The performance regression due to the range checks is more evident in models that use a lot of looping and when the functions in the model are not computationally complex. For example in a model that uses any sort of decompositions or run big matrix multiplication or dot_products this should be negligible.

Cool. I’ll add a step in my workflow whereby at compile I make separate debug and performance exes (the latter using STAN_NO_RANGE_CHECKS while also stripping of GQs), then at runtime with real data, run the debug for a sampling iteration on a single chain with the debug to verify no errors before switching to num_chains using the performance exe.

for all optimisations consider STAN_CPP_OPTIMS=TRUE as well

1 Like

Thanks! I literally just saw that in the regression issue discussion and made a note to check it out (and also try to self-educate on other possibly-workflow-useful flags)

I’m trying to unpack the rc from the link posted above but when I go to build cmdstan I’m getting

But when I run make build I’m getting

make -j20 build
make: *** No rule to make target 'stan/src/stan/model/model_header.hpp', needed by 'stan/src/stan/model/model_header.hpp.gch'.  Stop.
make: *** Waiting for unfinished jobs....
curl -L https://github.com/stan-dev/stanc3/releases/download/nightly/linux-stanc -o bin/stanc --retry 5 --retry-delay 10
g++    -c -fvisibility=hidden -o bin/cmdstan/stansummary.o src/cmdstan/stansummary.cpp

g++    -c -fvisibility=hidden -o bin/cmdstan/print.o src/cmdstan/print.cpp
g++    -c -fvisibility=hidden -o bin/cmdstan/diagnose.o src/cmdstan/diagnose.cpp
--- Compiling the main object file. This might take up to a minute. ---
g++    -c -o src/cmdstan/main.o src/cmdstan/main.cpp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0src/cmdstan/stansummary.cpp:1:10: fatal error: cmdstan/stansummary_helper.hpp: No such file or directory
    1 | #include <cmdstan/stansummary_helper.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
src/cmdstan/diagnose.cpp:1:10: fatal error: cmdstan/stansummary_helper.hpp: No such file or directory
    1 | #include <cmdstan/stansummary_helper.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
src/cmdstan/print.cpp:1:10: fatal error: cmdstan/print_helper.hpp: No such file or directory
    1 | #include <cmdstan/print_helper.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
src/cmdstan/main.cpp:1:10: fatal error: cmdstan/command.hpp: No such file or directory
    1 | #include <cmdstan/command.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [make/command:5: bin/cmdstan/stansummary.o] Error 1
make: *** [make/command:5: bin/cmdstan/diagnose.o] Error 1
make: *** [make/command:5: bin/cmdstan/print.o] Error 1
make: *** [make/program:14: src/cmdstan/main.o] Error 1
100   619  100   619    0     0   4421      0 --:--:-- --:--:-- --:--:--  4390
^Cmake: *** Deleting file 'bin/stanc'
make: *** [make/stanc:75: bin/stanc] Interrupt

I downloaded the tar and the source code tar but neither of them will compile. Do we change something in the tagged version of the source code?

There’s also a .DS_STORE sitting in the tar file

I installed via cmdstanr and had no issues. Not sure what’s up in your case.

This does not seem to be the correct tarball as is should not download stanc3 binaries for starters. I saw the DS_Store files, those will be cleaned up for the official release. Definitely should not cause problems.

Either a botched download or the untar failed for some reason.

Yep! Got it sorted out!

@wds15 on my computer I’m not seeing the slowdown

I downloaded the tar.gz files for 2.27-rc1 and 2.26.1 then built both of them with this in my make/local

CXXFLAGS+=-DSTAN_NO_RANGE_CHECKS -O3 -march=native -mtune=native

Then compiled the model and did 10 runs of each


time -p for run in {1..10}; do ./cmdstan-2.26.1/examples/blrm/blrm sample num_warmup=10000 num_samples=10000 data file="./cmdstan-2.26.1/examples/blrm/blrm.data.R" random seed=1234; done


time -p for run in {1..10}; do ./cmdstan-2.27/examples/blrm/blrm sample num_warmup=10000 num_samples=10000 data file="./cmdstan-2.27/examples/blrm/blrm.data.R" random seed=1234; done

I’m getting back

2.26:

real 398.32
user 397.21
sys 0.89

2.27-rc:

real 370.33
user 369.16
sys 0.94

So about 39 seconds for 2.26 and 37 seconds for 2.27. Did you run the models with a similar cxxflag to the one I used above?

1 Like

Though something to note, I was getting more of these messages in the 2.27 version

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Exception: Exception: binomial_logit_lpmf: Probability parameter[1] is -inf, but must be finite! (in 'examples/blrm/blrm.stan', line 495, column 4 to column 117) (in 'examples/blrm/blrm.stan', line 524, column 6 to line 530, column 46) (in 'examples/blrm/blrm.stan', line 852, column 6 to line 865, column 30)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

I did not turn on march=native or mtune=native … I used STAN_CPP_OPTIMS=true. Turning these options on seems to make the problem less severe, but it’s still there (now with 10k / 10k iterations):

> fit_26$time()$total
[1] 37.25854
> fit_27b$time()$total
[1] 44.7143
> fit_26$profiles()
[[1]]
       name   thread_id total_time forward_time reverse_time chain_stack
1   log-lik 0x11c6ecdc0   24.39250     21.42940     2.963150   443256661
2     prior 0x11c6ecdc0    3.16647      2.97015     0.196314    26208080
3 transform 0x11c6ecdc0    3.21690      2.02554     1.191360    41279868
  no_chain_stack autodiff_calls no_autodiff_calls
1      201152672         327601                 1
2       11138434         327601                 1
3       19657080         327601             10001

> fit_27b$profiles()
[[1]]
       name   thread_id total_time forward_time reverse_time chain_stack
1   log-lik 0x103defdc0   28.76310     24.94580     3.817360   445825278
2     prior 0x103defdc0    3.62940      3.40291     0.226487    26359840
3 transform 0x103defdc0    4.42848      3.14804     1.280440    17794080
  no_chain_stack autodiff_calls no_autodiff_calls
1      202318336         329498                 1
2        9884940         329498                 1
3       47450880         329498             10001

> 

I am on macOS Catalina

[20:29:58][weberse2@C02XK2AGJHD2:~/.cmdstanr/cmdstan-2.27.0-rc1]$ clang++ --version
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Can you follow the scheme I did above to

  1. download the two cmdstan releases 2.27-rc1 and 2.26.1

Unpack each and then do the following in each
2. make a folder with the model / data

  1. Have only the following in make/local
CXXFLAGS+=-DSTAN_NO_RANGE_CHECKS -O3 -march=native -mtune=native
  1. Run make build and then make ./examples/blrm

  2. run them locally with the command

time -p for run in {1..10}; do ./cmdstan-2.26.1/examples/blrm/blrm sample num_warmup=10000 num_samples=10000 data file="./cmdstan-2.26.1/examples/blrm/blrm.data.R" random seed=1234; done


time -p for run in {1..10}; do ./cmdstan-2.27/examples/blrm/blrm sample num_warmup=10000 num_samples=10000 data file="./cmdstan-2.27/examples/blrm/blrm.data.R" random seed=1234; done
  1. report the results time returns

It’s hard to compare apples and oranges with the results from time and Stan’s internal profiling data. I think we really want to measure the average time of running a model X times with the same seed and data

Did you add the -DSTAN_NO_RANGE_CHECKS macro?

That’s what I did.

Maybe someone else than me can try this as well?

Oh, where is the profiling information coming from? I’m not seeing any profile statements in the blrm model you posted

Can you post your results from time?