Dear Stan community:
It’s been a while since I played with this. I created a stan model using map_rect, and I’d like to run it using several cores via TBB. I haven’t tried this with cmdstan yet, but I would prefer to use rstan for seamless model tuning. Details below. Am I doing something wrong, or is this just not supported yet in rstan?
Is cmdstanr an option for you? Head over to the stan GitHub where you can find that R package which makes running stan models from R with cmdstan easy.
I have tried it two ways,
(1) I downloaded TBB from intel, and copied include dir into the rstan include directory
(2) removing (1), I used TBB from clear’s package manager
I forgot that for rstan 2.19.x, map_rect is implemented with just C++11 threads, rather than TBB. So, you don’t need any TBB stuff to get that to work.
For rstan 2.21.x — which is on GitHub but was not accepted by CRAN — you do need the RcppParallel package but you don’t need to be including TBB sources in rstan sources or anything like that.
I’d love to, but now it works again. I have no idea what it was. Maybe something in the .stan file, that still let it compile but didn’t let it run? I’ll keep an eye on it.
Is there a way I can look at the cpp file with cmdstanr? I’d like to know if the DSTAN_THREADS flag worked. I still don’t see the sampler use multiple cores.
You should only need to add -DSTAN_THREADS -pthread to the compilation flags and then at runtime set the environmental variable STAN_NUM_THREADS.
Well, the C++ version of stanc will go away some day (hopefully soon), but installing the V8 R package should suffice if you have the underlying library installed from the package manager. Having a dependency on the OCaml libraries would be more onerous than JavaScript from R’s perspective.
Thanks. num_threads was correct, but I recompiled with threads=TRUE. Still, not using the cores well. It could be my model. I want to look at different cuts to create the shards. However, is there another way to verify that it’s using all the cores? When running, it says ‘Running MCMC with 1 chain(s) on 12 core(s)…’, but is that definitely correct?
Thanks so much. I found one issue in my model file, and now it’s working much faster. It’s still not producing 100% load on 12 cores, but pretty high load on 8 of them. So that’s pretty nice already. Now I can do more tweaking and tuning on my model.
It works quite well, and cmdstanr works like a charm. I think I want to try MPI next, since I have a number of cores to keep busy.
I am not sure what to mark as a solution. For me the real solution is to switch to cmdstan, since it actually uses TBB. Strictly speaking, the question was about rstan, so either using the development version or just using threads would be the answer. Either way, thanks everyone.