Parallelising multipathfinder runs

ballardtj · January 18, 2024, 4:29am

Hi stan team,

I’m very excited about the work you all are doing with the new pathfinder implementation. I’ve set up a workflow like the one you describe in the Zhang et al (2021) paper, where I use the PSIS draws generated by the pathfinder algorithm to initialise short mcmc chains. The paper notes that pathfinder is embarrassingly parallelisable, presumably in part because seperate runs of multipathfinder can run simultaneously on seperate cores. Is there a way to do this yet? There doesn’t seem to be the ‘cores’ argument that the MCMC sampling functions have yet. But I was wondering if it’s possible to run several instances of single pathfinder (e.g., as seperate jobs on a cluster), saving the single path outputs and then somehow apply PSIS at the end on the combined single path outputs? Failing that, is it appropriate just to use the final draw of each single pathfinder run to initialise the MCMC chains without applying PSIS?

Cheers,
Tim

StaffanBetner · January 18, 2024, 8:21am

Parallelisation in Pathfinder is built upon the threading structure, so you need to compile the model with the flag stan_threads = TRUE and use the num_threads argument (that’s what it is called in cmdstanr anyway) for specifying the number of parallel pathfinder runs.

ballardtj · January 18, 2024, 11:41pm

Brilliant. That works perfectly. Thanks!

wpetry · January 19, 2024, 2:25pm

I ran into this exact same problem recently with cmdstanr. There are two open issues that, when fixed, should improve consistency in how parallelization is handled and to warn users who specify num_threads without compiling for multithreading with something very similar to @StaffanBetner’s solution.

irelamb · March 29, 2024, 2:24pm

Hi @StaffanBetner. Thank you for your explanation. Should num_threads correspond to the number of paths (num_paths)?

StaffanBetner · April 1, 2024, 8:50pm

It should corresponds to the number of parallel pathfinder runs.

mathDR · May 8, 2024, 1:54pm

To follow up on this thread: what is the backend doing if we have a reduce_sum implementation of a model and we run multiple num_paths of pathfinder on the model?

Will the num_threads be distributed first to pathfinder num_paths, then to reduce_sum threads?

andrjohns · May 8, 2024, 2:09pm

Like with multichain-parallelism, it operates using a central thread pool, where each path is first allocated to a thread and then if the individual paths have some parallelism (reduce_sum) the path will request additional threads from the threadpool

This also means that once a path finishes, the thread(s) it was using are then available to the other paths

Topic		Replies	Views
Pathfinder: num_threads and num CPUs CmdStan performance	4	215	March 29, 2024
Running cmdstanr in parallel on computing cluster General	6	993	December 9, 2022
Parallelization in Stan's models General rstan	5	112	May 19, 2025
Running chains on multiple cores Developers	2	895	January 30, 2023
Does Stan use multiple cores for multiple chains simultaneously even if we don't specify cores = 4 for chains =4? General	3	482	March 26, 2020

Parallelising multipathfinder runs

Related topics