Release candidate of CmdStan 2.28.0

I am happy to announce that the latest release candidates of Cmdstan and Stan are now available on Github!

This release cycle brings the complex type, support for a new built-in distribution, better handling of sampling with multiple chains, more efficient algebra solver and inv_Phi function and improve error messaging

You can find the release candidate for cmdstan here. Instructions for installing are given at the bottom of this post.

Please test the release candidate with your models and report if you experience any problems. We also kindly invite you to test the new features and provide feedback. If you feel some of the new features could be improved or should be changed before the release, please do not hesitate to comment.

The Stan development team appreciates your time and help in making Stan more efficient while maintaining a high level of reliability.

If everything goes according to plan, the 2.28 version will be released next Tuesday.

Below are some of the highlights of the new release.

Complex type

Stan now supports complex numbers with all of the standard complex functions, including natural logarithm log(z), natural exponentiation exp(z), and powers pow(z1, z2), as well as all of the trig and hyperbolic trigonometric functions and their inverse, such as sin(z), acos(z), tanh(z) and asinh(z).

More detailed documentation that will be part of the User’s guide is currently available here. The function reference is available here.

Skew double exponential distribution

All typical distributions-related functions are now available for the Skew double exponential distribution:

skew_double_exponential_lpdf, skew_double_exponential_cdf, skew_double_exponential_lcdf, skew_double_exponential_lccdf and skew_double_exponential_rng.

The functions reference is available here.

More efficient multi chain sampling with cmdstan using threads in a single executable

It is now possible to sample using NUTS with multiple chains with a single executable with multiple chains running using threads. This should be slightly faster in general and also reduces the memory footprint as all chains share the same copy of data, while previously each chain had its own copy of all the input data.

When using within-chain parallelization all chains started within a single executable can share all the available threads. This means what once a chain finishes the threads will be reused in the other chains. This type of resource sharing was previously unavailable.

More details are currently available here.

New num_threads command-line argument

The number of threads used with within-chain parallelization can now be specified as a regular argument to the CmdStan executable.

Previously the number of threads was specifed via an environment variable STAN_NUM_THREADS.

Details are available here.

More efficient Power & Newton algebra solvers and the inv_Phi function

The autodifferentiation in the Powell and Newton solvers now efficiently computes cotangents by replacing matrix inversion with a smaller number of matrix solves.

The inv_Phi should be approximately 2x faster with precision of 16 digits with changes based on the Fortran algorithm described in Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.

Automatic handling of precompiled headers on Windows with RTools 4.0

With CmdStan 2.28, precompiled headers that speedup model compilation are enabled by default on Windows when using RTools 4.0. Previously, they were only enabled if the user manually specified PRECOMPILED_HEADERS=true in the make/local file.

Improved error messaging/type error explanations in stanc3

The Stan-to-C++ compiler now has an improved error messaging, particulary with better explanations in case of type mismatches.

How to install?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release. We also have an online Cmdstan guide available at CmdStan User’s Guide

If you are using cmdstanpy, make sure you point to the folder where you extracted the tar.gz file with


set_cmdstan_path(os.path.join('path','to','cmdstan'))

With CmdStanR you can install the release candidate using


cmdstanr::install_cmdstan(version = "2.28.0-rc2", cores = 4)

And then select the RC with


cmdstanr::set_cmdstan_path(file.path(Sys.getenv("HOME"), ".cmdstanr", "cmdstan-2.28.0-rc2"))

15 Likes

Whoa! Amazing feature!

5 Likes

CmdStanPy has some tests for ADVI, and results for one model are very different between the reference CSV file from 2.20 and outputs from 2.28 rc2:

the model is taken from the test models in the Stan repo - stan/eta_should_be_big.stan at develop · stan-dev/stan · GitHub

Stan 2.20 CSV file estimates are (added spaces for readibility):

lp__, log_p__, log_g__, mu.1, mu.2
# Stepsize adaptation complete.
# eta = 1
0, 0, 0, 31.0299, 28.8141

which may or may not have been as expected, since eta=1 - is that big?

Stan 2.28-rc2 CSV file outputs are:

lp__, log_p__, log_g__, mu.1, mu.2
# Stepsize adaptation complete.
# eta = 100
0, 0, 0, 7261.34, 3725.82

this time, eta=100 and the estimates are very different.
not sure what this means, other than that the tests on CmdStanPy I/O are extremely brittle.

do we have regression tests for ADVI, or is it considered experimental?
are there changes to ADVI in 2.28?

1 Like

The tests for ADVI are here: https://github.com/stan-dev/stan/tree/develop/src/test/unit/variational
A test that uses exactly the model you mention is here: https://github.com/stan-dev/stan/blob/develop/src/test/unit/variational/eta_adapt_big_test.cpp

The EXPECT value at the bottom would suggest the correct value is 100.0? But my confidence here is very low as my understanding of all the ADVI stuff is very basic.

The last meaningful changed (non-docs or formatting) in the ADVI files were done in May 2019 so before 2.20.

1 Like

What does diagnose tell you about the gradients?

Diagnose looks good on my end:

TEST GRADIENT MODE

 Log probability=-190015

 param idx           value           model     finite diff           error
         0        -1.59078         7.00048         7.00049    -1.26242e-05
         1       -0.705393         5.00021          5.0002     1.34744e-05

Its similar with 2.27.0.

4 Likes

inv_Phi is looking good: here’s profiling between 2.28.0rc2 and 2.27

name           thread_id      total_time   forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 inv_Phi_2.28 0x10fec5e00    1.75221      1.23417     0.518044    38528000              0          385281                 1
1 inv_Phi_2.27 0x116b41e00    5.0122       4.27536     0.736845    48473000              0          484731                 1

on the stan model with N = 1000

data {
 int N;
}
parameters {
  vector<lower=0, upper=1>[N] x;
}
model {
  vector[N] y; 
  profile("inv_Phi") {
   y = inv_Phi(x); 
  }
}
3 Likes

I’m getting an error when trying to compile from github:

In file included from src/cmdstan/main.cpp:1:
src/cmdstan/command.hpp:27:10: fatal error: stan/callbacks/unique_stream_writer.hpp: No such file or directory
   27 | #include <stan/callbacks/unique_stream_writer.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Do I need to pull unique_stream_writer.hpp from elsewhere?

Did you do a fresh git clone of cmdstan? Did you clone recursively?

If you are using the cmdstan repo that’s most likely because your stan repo was not updated

A fresh git clone solved the issue. But, the build tag is still:
--- CmdStan v2.27.0 built ---

1 Like

That will be updated once the official release is made.

2 Likes