CmdStan map_rect not showing speed-up

Hi, I’m trying to use CmdStan with a map_rect example. I exactly followed this tutorial: https://github.com/rmcelreath/cmdstan_map_rect_tutorial, using their code as well. While I can get the model to compile and run, I don’t see the speed-up that the tutorial claims.

I set the -DSTAN_THREADS compiler argument, and set STAN_NUM_THREADS=-1 ( my CPU has 4 cores). However, my activity monitor shows that only one process with one thread is created even with these flags and map_rect code. I suspect that multithreading is not being enabled for some reason. I even re-tried this on another macOS machine in case it was some local configuration that was messing things up, but even then I didn’t see any speed-up.

  • Operating System: macOS Catalina
  • CmdStan Version: 2.23
  • Compiler/Toolkit: Clang
1 Like

Can u post the log from cmdstan? It sounds as if you do not set Stan num threads variable.

By log do you mean the stdout log?

    method = sample (Default)
      sample
        num_samples = 1000 (Default)
        num_warmup = 1000 (Default)
        save_warmup = 0 (Default)
        thin = 1 (Default)
        adapt
          engaged = 1 (Default)
          gamma = 0.050000000000000003 (Default)
          delta = 0.80000000000000004 (Default)
          kappa = 0.75 (Default)
          t0 = 10 (Default)
          init_buffer = 75 (Default)
          term_buffer = 50 (Default)
          window = 25 (Default)
        algorithm = hmc (Default)
          hmc
            engine = nuts (Default)
              nuts
                max_depth = 10 (Default)
            metric = diag_e (Default)
            metric_file =  (Default)
            stepsize = 1 (Default)
            stepsize_jitter = 0 (Default)
    id = 0 (Default)
    data
      file = redcard_input.R
    init = 2 (Default)
    random
      seed = 3730562925 (Default)
    output
      file = output.csv (Default)
      diagnostic_file =  (Default)
      refresh = 100 (Default)


    Gradient evaluation took 0.012445 seconds
    1000 transitions using 10 leapfrog steps per transition would take 124.45 seconds.
    Adjust your expectations accordingly!


    Iteration:    1 / 2000 [  0%]  (Warmup)
    Iteration:  100 / 2000 [  5%]  (Warmup)
    Iteration:  200 / 2000 [ 10%]  (Warmup)
    Iteration:  300 / 2000 [ 15%]  (Warmup)
    Iteration:  400 / 2000 [ 20%]  (Warmup)
    Iteration:  500 / 2000 [ 25%]  (Warmup)
    Iteration:  600 / 2000 [ 30%]  (Warmup)
    Iteration:  700 / 2000 [ 35%]  (Warmup)
    Iteration:  800 / 2000 [ 40%]  (Warmup)
    Iteration:  900 / 2000 [ 45%]  (Warmup)
    Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Iteration: 1100 / 2000 [ 55%]  (Sampling)
    Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Iteration: 1300 / 2000 [ 65%]  (Sampling)
    Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Iteration: 1500 / 2000 [ 75%]  (Sampling)
    Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Iteration: 1700 / 2000 [ 85%]  (Sampling)
    Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Iteration: 1900 / 2000 [ 95%]  (Sampling)
    Iteration: 2000 / 2000 [100%]  (Sampling)

     Elapsed Time: 72.9085 seconds (Warm-up)
                   62.8549 seconds (Sampling)
                   135.763 seconds (Total)


    real	2m17.931s
    user	2m15.366s
    sys	0m0.788s`Preformatted text`

If I run echo $STAN_NUM_THREADS, I get -1.

The output.csv stansummary is

Inference for Stan model: logistic1_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.

Warmup took (73) seconds, 1.2 minutes total
Sampling took (63) seconds, 1.0 minutes total

                 Mean     MCSE   StdDev     5%    50%    95%    N_Eff  N_Eff/s    R_hat
lp__            -7864  4.9e-02  1.0e+00  -7866  -7863  -7863  4.4e+02  7.0e+00  1.0e+00
accept_stat__    0.90  4.3e-03  1.4e-01   0.60   0.95    1.0  1.0e+03  1.6e+01  1.0e+00
stepsize__       0.64      nan  1.3e-15   0.64   0.64   0.64      nan      nan      nan
treedepth__       1.9  1.7e-02  5.4e-01    1.0    2.0    3.0  9.9e+02  1.6e+01  1.0e+00
n_leapfrog__      4.3  7.3e-02  2.1e+00    1.0    3.0    7.0  8.1e+02  1.3e+01  1.0e+00
divergent__      0.00      nan  0.0e+00   0.00   0.00   0.00      nan      nan      nan
energy__         7865  7.1e-02  1.5e+00   7863   7864   7867  4.2e+02  6.7e+00  1.0e+00
beta[1]          -5.5  2.0e-03  3.5e-02   -5.6   -5.5   -5.5  3.2e+02  5.0e+00  1.0e+00
beta[2]          0.28  4.5e-03  8.3e-02   0.15   0.28   0.41  3.4e+02  5.4e+00  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).

it looks to me as if cmdstan was not build from the start with threading support. So please do

  1. make clean-all
  2. ensure that STAN_THREADS=true is on make/local
  3. make build
  4. remove the model binary and rebuild it

then you should see a message indicating the number of threads which are being used by Stan.

4 Likes

That fixed it, thanks!