I’ve been using Stan for a while to fit hierarchical models, with good success (so thank you!). I recently got access to a computing cluster and have been looking into using MPI and map_rect to run my models on the cluster. The models I use generally take hours to days to run so the resulting speedup seems very useful. However, as someone new to MPI, I’m not sure I understand how the HMC estimation works with it.
My models generally estimate hierarchical data that have relatively little data at lower levels (I look at peoples’ performance on decision-making tasks, where the first level of data is choices on individual trials and the second level is subjects, where trials are nested within subjects), and so benefit greatly from partial pooling. If I run with only a few higher-level observations (i.e., only a few subjects’ data), models don’t converge or have other estimation issues. This makes me worry about splitting data into small shards with MPI because it seems that the model will not converge if it is only using a shard’s worth of data, but from my reading about MPI it seems that the information is shared among nodes during estimation so that this is not an issue. Or, do I need to make sure the amount of data in each shard is large enough to estimate the model on its own, or that enough shards are running at once to be able to share enough data across the nodes that are running? Any clarification, or direction to (relatively non-technical) resources I could use to learn about this, would be appreciated. Thank you!