Hi, I’m trying to use CmdStan with a map_rect example. I exactly followed this tutorial: https://github.com/rmcelreath/cmdstan_map_rect_tutorial , using their code as well. While I can get the model to compile and run, I don’t see the speed-up that the tutorial claims.
I set the -DSTAN_THREADS compiler argument, and set STAN_NUM_THREADS=-1 ( my CPU has 4 cores). However, my activity monitor shows that only one process with one thread is created even with these flags and map_rect code. I suspect that multithreading is not being enabled for some reason. I even re-tried this on another macOS machine in case it was some local configuration that was messing things up, but even then I didn’t see any speed-up.
Operating System: macOS Catalina
CmdStan Version: 2.23
Compiler/Toolkit: Clang
1 Like
wds15
May 2, 2020, 8:47pm
2
Can u post the log from cmdstan? It sounds as if you do not set Stan num threads variable.
By log do you mean the stdout log?
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 1000 (Default)
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.80000000000000004 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
metric_file = (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 0 (Default)
data
file = redcard_input.R
init = 2 (Default)
random
seed = 3730562925 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
Gradient evaluation took 0.012445 seconds
1000 transitions using 10 leapfrog steps per transition would take 124.45 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 72.9085 seconds (Warm-up)
62.8549 seconds (Sampling)
135.763 seconds (Total)
real 2m17.931s
user 2m15.366s
sys 0m0.788s`Preformatted text`
If I run echo $STAN_NUM_THREADS
, I get -1.
The output.csv stansummary is
Inference for Stan model: logistic1_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.
Warmup took (73) seconds, 1.2 minutes total
Sampling took (63) seconds, 1.0 minutes total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -7864 4.9e-02 1.0e+00 -7866 -7863 -7863 4.4e+02 7.0e+00 1.0e+00
accept_stat__ 0.90 4.3e-03 1.4e-01 0.60 0.95 1.0 1.0e+03 1.6e+01 1.0e+00
stepsize__ 0.64 nan 1.3e-15 0.64 0.64 0.64 nan nan nan
treedepth__ 1.9 1.7e-02 5.4e-01 1.0 2.0 3.0 9.9e+02 1.6e+01 1.0e+00
n_leapfrog__ 4.3 7.3e-02 2.1e+00 1.0 3.0 7.0 8.1e+02 1.3e+01 1.0e+00
divergent__ 0.00 nan 0.0e+00 0.00 0.00 0.00 nan nan nan
energy__ 7865 7.1e-02 1.5e+00 7863 7864 7867 4.2e+02 6.7e+00 1.0e+00
beta[1] -5.5 2.0e-03 3.5e-02 -5.6 -5.5 -5.5 3.2e+02 5.0e+00 1.0e+00
beta[2] 0.28 4.5e-03 8.3e-02 0.15 0.28 0.41 3.4e+02 5.4e+00 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
wds15
May 3, 2020, 8:27am
4
it looks to me as if cmdstan was not build from the start with threading support. So please do
make clean-all
ensure that STAN_THREADS=true is on make/local
make build
remove the model binary and rebuild it
then you should see a message indicating the number of threads which are being used by Stan.
4 Likes