CmdStan 2.26.0 release candidate

rok_cesnovar · January 19, 2021, 7:07pm

I am very happy to announce that the latest release candidate of Cmdstan is now available on Github! This release cycle brings you some new functions, performance improvements and profiling. You can find the release candidate for cmdstan here.

Please test the release candidate with your models and report if you experience any problem. We also kindly invite you to test the new features and provide feedback. If you feel some of the new features could be improved or should be changed before the release, please do not hesitate to comment.

The Stan development team appreciates your time and help in making Stan more efficient while maintaining a high level of reliability.

If everything goes according to plan, the 2.26 version will be released next Tuesday.

Below are some of the highlights of the new release.

New functions

chol2inv(L) - This function is still missing proper documentation but in short it returns the inverse of an input matrix which is a Cholesky factor. It follows the R chol2inv function. Its signature is matrix chol2inv(matrix L).
discrete_range distribution - documentation
generalized_inverse - documentation
linspaced_int_array - documentation
svd_U and svd_V - documentation
symmetrize_from_lower_tri - documentation

Profiling

Users can now profile the execution time of their models using profiling statements as shown in this example:

model {
   matrix[N, N] cov;
   matrix[N, N] L_cov;
   profile("building_cov") {
      cov =   gp_exp_quad_cov(x, alpha, rho)
                        + diag_matrix(rep_vector(sigma, N));
   }
   profile("cholesky_cov"){
      L_cov = cholesky_decompose(cov);
   }   
   profile("priors"){
      rho ~ gamma(25, 4);
      alpha ~ normal(0, 2);
      sigma ~ normal(0, 1);
   }
   profile("likelihood") {
      y ~ multi_normal_cholesky(rep_vector(0, N), L_cov);
   }
}

Stan will report the execution time and memory consumption of automatic differentiation (AD) for the statements in the profile. The two main motivations for using profiling is to identify bottlenecks in a model and to evaluate different approaches for a section of a model.

Without profiling, users had to have a good and deep understanding of the underlying AD implementation in Math for all the functions in order to try and optimize a section of the model using different approaches (using other functions or reduce_sum for example). But even with a deep uderstanding of the AD, it was very hard to measure the effect of a change to a section of the model as the effect was hard to identify in the execution time reported for a model’s overall execution, which can be very noisy due to randomness of sampling as well as input/output.

See here for a preliminary profiling example. More documentation will be ready for the release.

Backend performance improvements

improved use of Eigen expressions

All functions in Stan Math prim as well as user-defined Stan functions now accept and use Eigen expressions.
Where applicable functions also return Eigen expressions. This change has been in the works for multiple release cycles now and is now finalized.
Users should experience increased performance of generated quantites for statements with multiple vector/row_vector/matrix operations and increased performance of some distributions in all blocks of a Stan model.
increased and more robust performance of reduce_sum

We have optimized memory handling and copies when using automatic differentiation with reduce_sum.

Improved error messaging in the compiler and deprecation warnings

All functions and language features that have been deprecated now properly raise a warning during
compilation. You can use deprecated features/functions as normal until the next major release of Stan, but we advise you move to using the suggested non-deprecated function/language feature as soon as possible.

The Stan-to-C++ transpiler also now has an improved syntax/semantic error reporting.

OpenCL support for additional distributions

In addition to the GLM lpdf/lpmf functions, there are now 32 additional distributions that can utilize OpenCL on the GPU or CPU to speedup execution. The speedups vary between distributions and argument types (whether they are data or parameters). We have observed speedups in the range of 2- to 50-fold compared to sequential execution. We anticipate the next Stan release in April to have a fully-fledged GPU-enabled backend support for most Stan functions and any feedback at this point would be more than welcome.

List of OpenCL-enabled distributions for 2.26:

bernoulli_lpmf, bernoulli_logit_lpmf, bernoulli_logit_glm_lpmf
beta_lpdf, beta_proportion_lpdf
binomial_lpmf
categorical_logit_glm_lpmf
cauchy_lpdf
chi_square_lpdf
double_exponential_lpdf
exp_mod_normal_lpdf
exponential_lpdf
frechet_lpdf
gamma_lpdf
gumbel_lpdf
inv_chi_square_lpdf
inv_gamma_lpdf
logistic_lpdf
lognormal_lpdf
neg_binomial_lpmf, neg_binomial_2_lpmf, neg_binomial_2_log_lpmf, neg_binomial_2_log_glm_lpmf
normal_lpdf, normal_id_glm_lpdf
ordered_logistic_glm_lpmf
pareto_lpdf, pareto_type_2_lpdf
poisson_lpmf, poisson_log_lpmf, poisson_log_glm_lpmf
rayleigh_lpdf
scaled_inv_chi_square_lpdf
skew_normal_lpdf
std_normal_lpdf
student_t_lpdf
uniform_lpdf
weibull_lpdf

See here for instructions on how to setup and use OpenCL with CmdStan. Official Stan documentation will be available before the release.

We have also upgraded our OpenCL support for CPUs and integrated GPUs, limiting transfers when those devices are used.

How to install?

Download the tar.gz file from the link above, extract it and use it the way you use any Cmdstan release. We also have an online Cmdstan guide available at CmdStan User’s Guide

If you are using cmdstanpy, make sure you point to the folder where you extracted the tar.gz file with

set_cmdstan_path(os.path.join('path','to','cmdstan'))

With cmdstanr you can install the release candidate using

install_cmdstan(version = "2.26.0-rc1", cores = 4)

And then select the RC with

set_cmdstan_path(file.path(Sys.getenv("HOME"), ".cmdstanr", "cmdstan-2.26.0-rc1"))

Be advised that cmdstanr/py do not yet handle profiling CSV’s automatically.

mike-lawrence · January 19, 2021, 7:13pm

Wow, some amazing stuff here; hats off to the devs!

saudiwin · January 19, 2021, 10:42pm

Profiling is really a dream come true.

Stephen_Martin · January 19, 2021, 11:38pm

I’m curious if there’s any timeline for when rstan might be updated to make use of all the awesome features implemented since 2.21.

For my personal usage, cmdstanr would be fine, of course; but I am working on multiple R packages, and would like to have models precompiled at package install time, which means that rstan (afaik) must be used. But obviously, I cannot then use the latest and greatest stuff.

rok_cesnovar · January 20, 2021, 7:38am

Responded in the other thread asking on rstan maintanance. Lets continue this discussion there if you dont mind.

wds15 · January 20, 2021, 8:32am

I just ran my usual model which already found a few performance regressions in the past (mixture logistic regression). This time the 2.26.0 release speeds up in comparison to 2.25.0 (which made things slower due to varmat compared to 2.24.1) and 2.26.0 is even faster than 2.24.1 by 17%!! I guess this is due to Eigen expressions being plumbed through).

Awesome!

> ## 2.24.1
> time_241
   user  system elapsed 
 28.551   0.101  28.510 
> ## 2.25.0
> time_250
   user  system elapsed 
 31.583   0.109  32.085 
> ## 2.26.0rc1
> time_260rc1
   user  system elapsed 
 23.205   0.110  23.699 
> 
> ## relative to 2.24.1
> ref <- time_241
> ## 2.24.1
> time_241/ref
   user  system elapsed 
      2       2       1 
> ## 2.25.0
> time_250/ref
    user   system  elapsed 
2.111868 2.168755 1.125395 
> ## 2.26.0rc1
> time_260rc1/ref
     user    system   elapsed 
1.6049827 2.1115288 0.8312522 
> 
> ## relative to 2.25.0
> ref <- time_250
> ## 2.24.1
> time_241/ref
     user    system   elapsed 
1.8984837 1.8450604 0.8885772 
> ## 2.25.0
> time_250/ref
   user  system elapsed 
      2       2       1 
> ## 2.26.0rc1
> time_260rc1/ref
     user    system   elapsed 
1.5225623 1.9527363 0.7386318 
>

In case someone wants to repeat:

library(OncoBayes2)
library(rstan)
library(cmdstanr)

example("example-combo3")
cat(get_stancode(blrmfit$stanfit), "\n", file="blrm_exnex-241.stan")
rstan::stan_rdump(names(blrmfit$standata), "blrm_exnex.data.R", envir=list2env(blrmfit$standata))

## and then give these to the different Stan versions

Topic		Replies	Views
CmdStan & Stan 2.30 release candidate Developers	16	1376	August 5, 2022
CmdStan & Stan 2.36 release candidate General	2	406	December 3, 2024
CmdStan & Stan 2.35 release candidate General	17	1061	May 23, 2024
Release candidate of CmdStan 2.28.0 General	11	1509	October 1, 2021
Stan 2.25 release candidate! Announcements release	18	1820	October 19, 2020