Dear Stan community:
It’s been a while since I played with this. I created a stan model using map_rect, and I’d like to run it using several cores via TBB. I haven’t tried this with cmdstan yet, but I would prefer to use rstan for seamless model tuning. Details below. Am I doing something wrong, or is this just not supported yet in rstan?
You should only need to add -DSTAN_THREADS -pthread to the compilation flags and then at runtime set the environmental variable STAN_NUM_THREADS.
Thanks. num_threads was correct, but I recompiled with threads=TRUE. Still, not using the cores well. It could be my model. I want to look at different cuts to create the shards. However, is there another way to verify that it’s using all the cores? When running, it says ‘Running MCMC with 1 chain(s) on 12 core(s)…’, but is that definitely correct?
Thanks so much. I found one issue in my model file, and now it’s working much faster. It’s still not producing 100% load on 12 cores, but pretty high load on 8 of them. So that’s pretty nice already. Now I can do more tweaking and tuning on my model.
It works quite well, and cmdstanr works like a charm. I think I want to try MPI next, since I have a number of cores to keep busy.
I am not sure what to mark as a solution. For me the real solution is to switch to cmdstan, since it actually uses TBB. Strictly speaking, the question was about rstan, so either using the development version or just using threads would be the answer. Either way, thanks everyone.