Four chains vs four jobs

Corey.Plate · June 19, 2024, 12:29am

Without CmdStanR, would contrasting 1 chain and 1 thread vs 1 chain and 2 threads, etc. just be looking at model run completion times? I would only have access to the output files, the CPU utilization, and the runtimes for the models.

andrjohns · June 19, 2024, 12:37am

Without CmdStanR, would contrasting 1 chain and 1 thread vs 1 chain and 2 threads, etc. just be looking at model run completion times?

Isn’t the completion time what you want to change with parallelism?

Corey.Plate · June 19, 2024, 12:42am

Yes, correct. I’m just trying to get a sense of what I can do with what I have access to. There are time constraints, so I am trying to tackle this as efficiently as possible, and understand what I’m looking out for. For example, if I really have to do these single comparisons and I’m basing it on their relative runtimes, I would ideally try out a bunch of them at once and then just see which one wins the race. i.e. trying 1, 2, 3, 4, 5, 6, 7, and 8 threads on 1 chain, then trying 9, 10, 11, 12, 13, 14, 15, and 16, etc. If all I have to worry about is runtimes, I can compare all of them later. But that’s all I’ll have access to, so I wanted to make sure I understood what I’m supposed to be doing. I’m also guessing that if I move to 4 simultaneous chains per job, I should multiply whatever the magic number of threads is by 4?

andrjohns · June 19, 2024, 12:48am

If there are time constraints, why not just estimate the model with a single thread per chain (or just pick a number of threads) instead of doing the comparisons?

Corey.Plate · June 19, 2024, 12:51am

Just because I don’t have any sense of what the optimal number of threads per chain is, our model is very computationally intensive, with little room to make it less so, and we’re going to need to run this model many times with synthetic data, so optimal runtimes are important. I’m caught in a bit of a balancing act between finding the best efficiniency (in terms of parallelization and improving runtimes) and how much time I can devote to doing so.

andrjohns · June 19, 2024, 12:58am

The main reason to start with the 1 thread vs 2 thread comparison is to see whether the computations per thread are complex enough at all to benefit from parallelism. If you don’t see much improvement from 2 threads, then you can focus on other optimisations.

Corey.Plate · June 19, 2024, 1:03am

I see. And assuming there is no real benefit to the parallelization, I’m guessing it’s nothing that would be any better helped by map_rect, since, ostensibly, all that does is give us more CPU power to work with, right?

andrjohns · June 19, 2024, 1:07am

all that does is give us more CPU power to work with, right?

Yes, if your current parallelism of the likelihood does not see a benefit from 2 threads vs 1, you won’t see a benefit from distributing work across even more cores

Corey.Plate · June 19, 2024, 1:11am

Alright, thank you for all of your help. It’s good to know that we haven’t actually been using the MPI and multi-node cluster. I’ll have to relearn some things with respect to communicating with the cluster, but I think I have a much better idea now of what I should be looking out for, and how to communicate with the cluster administrator about my needs. Many thanks.

Topic		Replies	Views
Running cmdstanr in parallel on computing cluster General	6	985	December 9, 2022
Stan - Memory usage when running on high performance clusters in parallel General	0	445	June 1, 2022
Chain parallelization with Stan in Slurm General paralellization	2	532	August 16, 2023
Chain Parallelization in Stan General	11	3106	September 13, 2019
Plans for parallelization Developers	9	1323	March 26, 2018

Four chains vs four jobs

Related topics