@Bob_Carpenter, @wds15, I wonder under current map_rect design what’s the best way to define multiple MPI communicators. For example in a population model, for each individual how to allocate a dedicated communicator.
What do you mean? What are communicators?
An MPI Communicator defines a communication arena in which different computing nodes can pass information (roughly speaking). What I’m asking is essentially the first step of how to achieve inter-communication.
Not sure if I get where you are coming from nor where you are heading… for a hierarchical model with J subjects, we ask the user to structure the data such that the first dimension of the arrays runs from 1…J - this codes what is considered as a unit. The those J units are dispatched onto the M nodes (including the root) in equal work junks of size J/M. That’s it. So all what we do we with MPI is broadcasts/scatters/gathers command with the root node as the source of work and sink for the results.
Not sure if that answered your question though.
With J subjects in a hierarchical model, suppose in addition to the M nodes, one has another J*N nodes and he wants to use N nodes for each of the J subjects. That means there are 1 + J communicators in the run, 1 for the M subjects, another J for each subject. How to, or is it possible, to achieve that using map_rect
?
So you mean more than 1 MPI node per subject? No that is not possible at the moment.
… what I have planned to do is to combine MPI with threading. So say we have M MPI nodes and the chunk-size J/M = C. Then on a given node the C units can be processed using threading. This should give major speedups as we limit the use of MPI and combine it with threading.
I am not sure what applications need more than 1 core for a given subject. In the current design using more than 1 CPU per subject through MPI is probably best done through threading which should anyway be more efficient?
Yes, that’s what I’m asking about.
Many things, including any large-scale linear algebra.
I understand MPI+threading has been a popular setup, but with MPI-3 this won’t be necessary, as MPI-3 already supports SMP with shared memory.
Getting MPI into Stan will be such a relief…what you target sounds like the next gen barrier.
Maybe… but MPI is a huge pain to program while using threading via C++11 facilities is far easier. OK, this argument does not apply (that much) to off-the-shelve libraries which are already written.
So I guess map_rect
uses a single MPI_COMM_WORLD
. Is that correct?
Probably yes… I work with boost::mpi to escape from this MPI low-level stuff as much as I can (such that I am not 100% familiar with all the low-level things).
No problem. Thanks.
I think we may want to document MPI communication info(if not yet), as well as printing out debug information such as communicator name, size, time, etc.