How to use Stanc3 compiler optimization flag --O --O1 in cmdstanr?

Dear Stan community,

I just read the new Stan 2.29.0 release note Release of CmdStan 2.29 – The Stan Blog and found these auto-formatter, canonicalizer and optimization are very interesting. And I was not aware of these features previously. May I ask if one can apply this in cmdstanr? like some flags when compiling the model? Except this nice demo Stan-to-C++ optimizations from @rok_cesnovar, is there a way for R or Python users to automatically apply these features? Thank you very much.

Michelle

Hi,

I apologize for the lack of a cmdstanr example. Here you go:

library(cmdstanr)

n <- 25000
k <- 10
X <- matrix(rnorm(n * k), ncol = k)
y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
mdata <- list(k = k, n = n, y = y, X = X)

mod_opt <- cmdstan_model("lr.stan", stanc_options = list("O1"))
fit_opt <- mod_opt$sample(data = mdata, chains = 1, refresh = 500)

This was used with the following model:

data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  array[n] int y;
}
parameters {
  vector[k] beta;
  real alpha;
}
model {
  target += std_normal_lpdf(beta | );
  target += std_normal_lpdf(alpha | );
  target += bernoulli_logit_lpmf(y | alpha + X* beta);
}
2 Likes

Also a quick script to see the comparison in terms of speed:

library(cmdstanr)

n <- 25000
k <- 10
X <- matrix(rnorm(n * k), ncol = k)
y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
mdata <- list(k = k, n = n, y = y, X = X)

mod_no_opt <- cmdstan_model("lr.stan", exe_file = "no_opt_lr")
mod_opt <- cmdstan_model("lr.stan", exe_file = "opt_lr", stanc_options = list("O1"))

fit_no_opt <- mod_no_opt$sample(data = mdata, chains = 1, refresh = 500)

fit_opt <- mod_opt$sample(data = mdata, chains = 1, refresh = 500)

print(fit_no_opt$time()$total)
print(fit_opt$time()$total)

The exe_file is there so we can make 2 executables from the same model file.

The results are:
Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 17.8 seconds.
Running MCMC with 1 chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 10.6 seconds.
[1] 17.94077
[1] 10.78738

To not get everyone’s hopes too high, I would not expect as huge gains every time :)

4 Likes

This is amazing. Thank you very much for developing this. I also liked so much the online demo which is a nice tool for me to check what I could improve/canonicalize on my dirty Stan code. It’s also like a home tutor to learn some advanced features of Stan based on my own model. :)

May I ask if I could blindly apply this -O1 to all models or it is better to always compare?

3 Likes

I would recommend comparison if you are able, especially while the optimizations are still being developed. If you run into any incorrect behavior or compilation problems when using it that don’t occur without O1, we would really love to hear about it

2 Likes

We have found at least one model which segfaults with -O1, so I would not use it yet blindly.

2 Likes

Hi @rok_cesnovar, just to feedback on what I played with this optimization flag. I could reproduce your example, no_opt took 14.9s, whereas opt took 20.3s on my machine. However, the option exe_file seems not working and giving me error like

Error in self$compile(...) : unused argument (exe_file = "no_opt_lr")

So I have to delete the .exe file and recompile every time I switch between opt and no_opt. Do you have any idea where I am wrong? I think this exe_file is very convenient and would like to use it. Thanks a lot.

Btw, I also tried on my hierarchical variate-covariate model with AR(1) residual, it took 262.8s with opt and 285.3 without opt. All tests are using the same seed.

This was added just recently. Run:

remotes::install_github("stan-dev/cmdstanr")

and then try again.

Thanks for the feedback!

1 Like

It works, just it seems that I have to specify the extension as .exe otherwise it will tell me Error in rethrow_call(c_processx_exec, command, c(command, args), pty, : file not found.