Without CmdStanR, would contrasting 1 chain and 1 thread vs 1 chain and 2 threads, etc. just be looking at model run completion times? I would only have access to the output files, the CPU utilization, and the runtimes for the models.
Without CmdStanR, would contrasting 1 chain and 1 thread vs 1 chain and 2 threads, etc. just be looking at model run completion times?
Isn’t the completion time what you want to change with parallelism?
Yes, correct. I’m just trying to get a sense of what I can do with what I have access to. There are time constraints, so I am trying to tackle this as efficiently as possible, and understand what I’m looking out for. For example, if I really have to do these single comparisons and I’m basing it on their relative runtimes, I would ideally try out a bunch of them at once and then just see which one wins the race. i.e. trying 1, 2, 3, 4, 5, 6, 7, and 8 threads on 1 chain, then trying 9, 10, 11, 12, 13, 14, 15, and 16, etc. If all I have to worry about is runtimes, I can compare all of them later. But that’s all I’ll have access to, so I wanted to make sure I understood what I’m supposed to be doing. I’m also guessing that if I move to 4 simultaneous chains per job, I should multiply whatever the magic number of threads is by 4?
If there are time constraints, why not just estimate the model with a single thread per chain (or just pick a number of threads) instead of doing the comparisons?
Just because I don’t have any sense of what the optimal number of threads per chain is, our model is very computationally intensive, with little room to make it less so, and we’re going to need to run this model many times with synthetic data, so optimal runtimes are important. I’m caught in a bit of a balancing act between finding the best efficiniency (in terms of parallelization and improving runtimes) and how much time I can devote to doing so.
The main reason to start with the 1 thread vs 2 thread comparison is to see whether the computations per thread are complex enough at all to benefit from parallelism. If you don’t see much improvement from 2 threads, then you can focus on other optimisations.
I see. And assuming there is no real benefit to the parallelization, I’m guessing it’s nothing that would be any better helped by map_rect, since, ostensibly, all that does is give us more CPU power to work with, right?
all that does is give us more CPU power to work with, right?
Yes, if your current parallelism of the likelihood does not see a benefit from 2 threads vs 1, you won’t see a benefit from distributing work across even more cores
Alright, thank you for all of your help. It’s good to know that we haven’t actually been using the MPI and multi-node cluster. I’ll have to relearn some things with respect to communicating with the cluster, but I think I have a much better idea now of what I should be looking out for, and how to communicate with the cluster administrator about my needs. Many thanks.