I recently got Ubuntu 20.04 running under WSL2 on one of my Windows 10 PCs and have been doing some performance comparisons. It appears that CmdStan 2.27.0’s execution time takes from 25% to 100% or more longer than 2.25.0 on the same model with the same data. Here is a sample Stan model:
functions {
real partial_sum(int[] ind, int start, int end, vector y, matrix X,
real a, vector b, real sigma) {
return normal_id_glm_lpdf(y[start:end] | X[start:end,], a, b, sigma);
}
}
data {
int<lower=0> N;
int<lower=0> M;
matrix[N, M] X;
vector[N] y;
}
transformed data {
int grainsize = 1;
int ind[N] = rep_array(1, N);
}
parameters {
real a;
real<lower=0> sigma;
vector[M] b;
}
model {
a ~ normal(0, 5);
sigma ~ normal(0, 10);
b ~ std_normal();
target += reduce_sum(partial_sum, ind, grainsize, y, X, a, b, sigma);
}
I create fake data for it with the following R function:
fake_multi <- function(N=100000, M=50, alpha=1, sigma=0.25) {
X <- matrix(rnorm(N*M), N, M)
beta <- rnorm(M)
y <- as.vector(alpha + X %*% beta + rnorm(N, sd=sigma))
data_list <- list(N=N, M=M, X=X, y=y)
list(data_list=data_list, beta=beta)
}
I have the release source distributions of cmdstan-2.25.0 and cmdstan-2.27.0 and both were built with the same compiler flags, specifically STAN_THREADS=true
and STAN_CPP_OPTIMS=true
.
I ran the models using cmdstanr with, for example
time_t_25 <- system.time(reg_t_25 <- multireg2t$sample(reg_dat, chains=4, parallel_chains=4, threads_per_chain=4, seed
=987654L))
then deleted the executable and recompiled with cmdstan-2.27.0. Here are some time comparisons:
> reg_t_25$time()
$total
[1] 79.65344
$chains
chain_id warmup sampling total
1 1 33.480 25.456 58.936
2 2 43.502 23.439 66.941
3 3 33.337 25.780 59.117
4 4 59.370 18.864 78.234
> reg_t_27$time()
$total
[1] 100.9266
$chains
chain_id warmup sampling total
1 1 42.651 32.131 74.782
2 2 53.790 30.842 84.632
3 3 43.951 33.104 77.055
4 4 72.114 27.417 99.531
The difference is even larger for the models I’m actually running at the moment. Here are some timing comparisons running in both WSL and Windows on the same PC – the Win side has Rtools40 installed. I have gcc 9.3 in WSL. These are also run with cmdstanr and then converted to stanfit objects:
> get_elapsed_time(sfit.146$stanfit)
warmup sample
chain:1 2408.44 1791.58
chain:2 2340.84 1795.19
chain:3 2405.28 1810.34
chain:4 2469.86 1814.97
> get_elapsed_time(sfit.146_25$stanfit)
warmup sample
chain:1 1143.73 898.871
chain:2 1098.66 900.943
chain:3 1238.32 878.939
chain:4 1218.81 871.352
> get_elapsed_time(sfit.146_win$stanfit)
warmup sample
chain:1 4590.94 3262.52
chain:2 3844.77 5569.48
chain:3 3885.73 2995.80
chain:4 3846.82 5108.44
> get_elapsed_time(sfit.146_win_25$stanfit)
warmup sample
chain:1 1369.74 1093.47
chain:2 1475.25 1065.62
chain:3 1377.00 1671.71
chain:4 1401.22 1062.80
>
sessionInfo() for the fake data runs:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmdstanr_0.4.0.9000 zernike_3.7.1
loaded via a namespace (and not attached):
[1] rstan_2.21.2 tidyselect_1.1.1 xfun_0.24
[4] purrr_0.3.4 V8_3.4.2 colorspace_2.0-2
[7] vctrs_0.3.8 generics_0.1.0 stats4_4.1.0
[10] loo_2.4.1 utf8_1.2.2 rlang_0.4.11
[13] pkgbuild_1.2.0 pillar_1.6.1 glue_1.4.2
[16] withr_2.4.2 distributional_0.2.2 matrixStats_0.60.0
[19] lifecycle_1.0.0 posterior_1.0.1 munsell_0.5.0
[22] gtable_0.3.0 codetools_0.2-18 inline_0.3.19
[25] knitr_1.33 callr_3.7.0 ps_1.6.0
[28] curl_4.3.2 parallel_4.1.0 fansi_0.5.0
[31] Rcpp_1.0.7 scales_1.1.1 backports_1.2.1
[34] checkmate_2.0.0 RcppParallel_5.1.4 StanHeaders_2.21.0-7
[37] jsonlite_1.7.2 abind_1.4-5 farver_2.1.0
[40] gridExtra_2.3 tensorA_0.36.2 ggplot2_3.3.5
[43] processx_3.5.2 dplyr_1.0.7 grid_4.1.0
[46] cli_3.0.1 tools_4.1.0 magrittr_2.0.1
[49] tibble_3.1.3 crayon_1.4.1 pkgconfig_2.0.3
[52] ellipsis_0.3.2 data.table_1.14.0 prettyunits_1.1.1
[55] R6_2.5.0 compiler_4.1.0
I have to say I was a little skeptical that R and Stan would perform better in WSL than in Windows on the same hardware, but I’m happy to have proved myself wrong. Now that I also have an X server working I have no reason to go back to R on Windows.