Hah, that was a misleading error message!
Ah, I’ll wait then
Hah, that was a misleading error message!
Ah, I’ll wait then
I will discuss with @rok_cesnovar - I think there have been enough minor fixes in the past week to justify a 2.30.1
After installing the experimental version of RStan, you can use the nightly stanc3
as follows:
file.remove(system.file("stanc.js", package = "rstan"))
download.file("https://github.com/stan-dev/stanc3/releases/download/nightly/stanc.js", file.path(find.package("rstan"), "stanc.js"))
Sounds good. Thanks!
Should RStan stan()
command honour options(stanc.allow_optimizations = TRUE)
? I installed the latest experimental version (with your commits today), but running stan()
creates the C++ code without --O1 flag.
It should. What’s the output of the following:
formals(rstan::stanc)
and
getOption("stanc.allow_optimizations")
> getOption("stanc.allow_optimizations")
[1] TRUE
> formals(rstan::stanc)
$file
$model_code
[1] ""
$model_name
[1] "anon_model"
$verbose
[1] FALSE
$obfuscate_model_name
[1] TRUE
$allow_undefined
isTRUE(getOption("stanc.allow_undefined", FALSE))
$allow_optimizations
isTRUE(getOption("stanc.allow_optimizations", FALSE))
$standalone_functions
isTRUE(getOption("stanc.standalone_functions", FALSE))
$use_opencl
isTRUE(getOption("stanc.use_opencl", FALSE))
$warn_pedantic
isTRUE(getOption("stanc.warn_pedantic", FALSE))
$warn_uninitialized
isTRUE(getOption("stanc.warn_uninitialized", FALSE))
$isystem
c(if (!missing(file)) dirname(file), getwd())
How did you know it doesn’t honour the option? Maybe, your code isn’t affected by O1
optimizations? Also, does rstan::stanc(model_code = <stancode>, allow_optimizations = TRUE)
generate different C++ code?
The generated code shows
return std::vector<std::string>{"stanc_version = stanc3 v2.30.0", "stancflags = "};
Works with CmdStanR: with 50% drop in sampling time.
Also misses the --O1 from that line I mentioned above
I would not rely on the "stancflags = "
for stancjs. This line is generated based on the presumed existence of argv
from the system, which won’t be populated on stancjs
Argh
Agreed, this is an oversight we should fix
I was not aware of that not being respected in stanc.js. Sorry for the misleading.
Don’t worry. It was useful with CmdStanR, and thanks to this it will be fixed.
When will the 01 optimizations be certified Aki-fresh?
Seriously, @avehtari, thanks for going down this rabbit hole and improving the code base!
I got now stan_model()
with allow_optimizations = TRUE
to work and see 50% speedup. I don’t have now time to investigate further or time to test stan()
or rstanarm compilation.
I would also like to thank @hsbadr - having the experimental branch of RStan working with develop is great for stress testing these features of the stancjs interface and also testing against more complicated models which people in the R ecosystem have been building up for years
Yes, huge thanks to @hsbadr! He’s the hero of this story.
I have now succesfully tested that these work
stan_model(file=file, allow_optimizations = TRUE)
and
options(stanc.allow_optimizations = TRUE)
stan(file=file, data=data)
Two suggestions
stan_model(.,verbose = TRUE)
it would be useful to print out also the stanc flagsThere is also an issue that RStan tries to be clever avoiding recompilation, but fails when STANCFLAGS change. I first run stan_model(file=file1, allow_optimizations = FALSE)
, and after that running stan_model(file=file2, allow_optimizations = TRUE)
, where file2 was the same as file1 except for the name and initial whitespace, doesn’t recreate C++ code but recompiles the cached C++ code with optimizations off.
I also was able to install the latest rstanarm
, but not yet convinced that --O1
flag did get through.
However, I realized I can check whether rstanarm code has anything that can use SoA, and it seems there is not much speedup expected. For example, bernoulli.stan:
> model <- cmdstan_model(stan_file="bernoulli.stan", compile=FALSE);model$check_syntax(stanc_options = list("debug-mem-patterns", "O1"))
vector[z_beta_1dim__] z_beta: AoS
vector[K_smooth] z_beta_smooth: AoS
vector[smooth_sd_raw_1dim__] smooth_sd_raw: AoS
array[vector[K], hs] local: AoS
array[vector[K], mix_1dim__] mix: AoS
vector[q] z_b: AoS
vector[len_z_T] z_T: AoS
vector[len_rho] rho: AoS
vector[len_concentration] zeta: AoS
vector[t] tau: AoS
vector[K] beta: AoS
vector[K_smooth] beta_smooth: AoS
vector[smooth_sd_1dim__] smooth_sd: AoS
vector[q] b: AoS
vector[len_theta_L] theta_L: AoS
vector[0] inline_hs_prior_return_sym109__: AoS
vector[inline_hs_prior_K_sym110__] inline_hs_prior_lambda_sym111__: AoS
vector[inline_hs_prior_K_sym110__] inline_hs_prior_lambda2_sym113__: AoS
vector[inline_hs_prior_K_sym110__] inline_hs_prior_lambda_tilde_sym114__: AoS
vector[0] inline_hs_prior_return_sym102__: AoS
vector[inline_hs_prior_K_sym103__] inline_hs_prior_lambda_sym104__: AoS
vector[inline_hs_prior_K_sym103__] inline_hs_prior_lambda2_sym106__: AoS
vector[inline_hs_prior_K_sym103__] inline_hs_prior_lambda_tilde_sym107__: AoS
vector[0] inline_hsplus_prior_return_sym94__: AoS
vector[inline_hsplus_prior_K_sym95__] inline_hsplus_prior_lambda_sym96__: AoS
vector[inline_hsplus_prior_K_sym95__] inline_hsplus_prior_eta_sym97__: AoS
vector[inline_hsplus_prior_K_sym95__] inline_hsplus_prior_lambda_eta2_sym99__: AoS
vector[inline_hsplus_prior_K_sym95__] inline_hsplus_prior_lambda_tilde_sym100__: AoS
vector[0] inline_hsplus_prior_return_sym86__: AoS
vector[inline_hsplus_prior_K_sym87__] inline_hsplus_prior_lambda_sym88__: AoS
vector[inline_hsplus_prior_K_sym87__] inline_hsplus_prior_eta_sym89__: AoS
vector[inline_hsplus_prior_K_sym87__] inline_hsplus_prior_lambda_eta2_sym91__: AoS
vector[inline_hsplus_prior_K_sym87__] inline_hsplus_prior_lambda_tilde_sym92__: AoS
vector[0] inline_make_theta_L_return_sym126__: AoS
vector[len_theta_L] inline_make_theta_L_theta_L_sym127__: AoS
matrix[inline_make_theta_L_nc_sym132__, inline_make_theta_L_nc_sym132__] inline_make_theta_L_T_i_sym133__: AoS
vector[inline_make_theta_L_nc_sym132__] inline_make_theta_L_pi_sym137__: AoS
vector[inline_make_theta_L_r_sym142__] inline_make_theta_L_T_row_sym139__: AoS
vector[0] inline_make_b_return_sym145__: AoS
vector[rows(z_b)] inline_make_b_b_sym146__: AoS
matrix[inline_make_b_nc_sym149__, inline_make_b_nc_sym149__] inline_make_b_T_i_sym150__: AoS
vector[inline_make_b_nc_sym149__] inline_make_b_temp_sym153__: AoS
vector[(K + K_smooth)] coeff: AoS
vector[N[1]] eta0: AoS
vector[N[2]] eta1: AoS
vector[inline_ll_clogit_lp_J_sym180__] inline_ll_clogit_lp_summands_sym183__: AoS
vector[inline_ll_clogit_lp_N_g_sym185__] inline_ll_clogit_lp_eta_g_sym187__: AoS
vector[0] inline_pw_bern_return_sym159__: AoS
vector[inline_pw_bern_N_sym160__] inline_pw_bern_ll_sym161__: AoS
vector[inline_pw_bern_N_sym160__] inline_pw_bern_pi_sym162__: SoAvector[0] inline_pw_bern_inline_linkinv_bern_return_sym4___sym163__: AoS
vector[0] inline_pw_bern_return_sym168__: AoS
vector[inline_pw_bern_N_sym169__] inline_pw_bern_ll_sym170__: AoS
vector[inline_pw_bern_N_sym169__] inline_pw_bern_pi_sym171__: SoAvector[0] inline_pw_bern_inline_linkinv_bern_return_sym4___sym172__: AoS
vector[(p[inline_decov_lp_i_sym197__] - 1)] inline_decov_lp_shape1_sym193__: AoS
vector[(p[inline_decov_lp_i_sym197__] - 1)] inline_decov_lp_shape2_sym194__: AoS
The specific example I’ve been running is a bit sensitive to random initialization and exact prior specifications, but with logistic regression with regularized horseshoe prior and p=1536, n=54, stan_glm is now an order of magnitude slower than corresponding brms generated code. I know it’s probably difficult to change rstanarm code as it needs to be very flexible, but pinging @bgoodri so that at least he is aware of the experiment.
And thanks @hsbadr for all the help here and for all the work on getting RStan and rstanarm t owork with the latest Stan!