Thanks for the suggestions.
I still have some questions.
- Is the right way to compile cmd stan with CXXFLAGS += -DSTAN_THREADS -pthread in make/local?
Probably I didn’t hit the right manual because most of my knowledge is based on Linear, parallell regression thread.
I compiled cmdstan with CXXFLAGS += -DSTAN_THREADS -pthread in make/local as was suggested in the thread.
Yes, I used map_rect and it works. I was able to run 10, 15, and 20 shards and actually checked with htop that corresponding # of threads are running. I didn’t use mpirun though. The command line I was using was:
export STAN_NUM_THREADS=15
time ./NNsigma.2.18.mpi sample …
However, when I tried 100 shards I found that only first node is active. All others were idle.
Obviously I tried mpirun with 20 shards. Here the output about the progress was repeated 20 times.