I have stan code with reduce_sum where the multi threading works great with sampling. The optimize method works with the same code, but it reports num_threads = 1. I am using cmdstanr, and I do not see any parameter to specify the number of threads. Can the optimize method use multiple threads? And if so, how do you specify that (preferably in cmdstanr)?
Hi @klattery welcome to the Stan forums! In theory the treading should work with optimization but unfortunately we just havenât implemented it in the cmdstanr interface yet. This is definitely on our to-do list for cmdstanr, which is still quite new so some things like this are missing:
I think it should work if you use cmdstan directly without the cmdstanr wrapper but thatâs much less convenient. Hopefully we can add the necessary code to cmdstanr soon!
You can actually use it, but you have to set the environment variable yourself.
So
Sys.setenv(âSTAN_NUM_THREADSâ=X)
mod$optimize()
And it should work provided the model was compiled with stan_threads.
This will be nicer once we close that issue as @jonah says.
Good point @rok_cesnovar!
YES!!! You made my day.
I canât thank you guys at Stan enough for adding multi-threading this year. I have a 32 core AMD threadripper and the difference is amazing, both for sampling and now optimization (which I run for quick model testing). With 1 thread the optimization took 2 hours. I just did in 10 minutes (same exact computer and code/data). Likewise with sampling, models that took 3 days can now be estimated in 8 hours.
Just to be clear, here is the R code calling compiling and calling Stan optimize (Iâm running cmdstanr on WSL).
HB_model <- cmdstan_model(file.path(dir_model, âLogDiff_SUR2.2.stanâ), quiet = TRUE, cpp_options = list(stan_threads = TRUE))
Sys.setenv(âSTAN_NUM_THREADSâ = 30)
HB_MLE <- HB_model$optimize(modifyList(data_list, data_model), init = .5, seed = 2718,
refresh = 5, iter = 1000)
@klattery Thatâs awesome! Really glad to hear youâve been able to take advantage of the multithreading.
Just following up here to say that I just merged @rok_cesnovarâs PR to add support for threading for optimization and variational inference in cmdstanr. So you can now specify the number of threads to use via the threads
argument to the $optimize()
method instead of manually specifying the STAN_NUM_THREADS
environment variable:
I assume this is all analogous in the cmdstanpy
universe? That is, if I compile a reduce_sum
model with STAN_THREADS = True
and run optimize
with os.environ['STAN_NUM_THREADS'] = str(8)
or something, I will get a parallel run of MLE?
I wonder if this is a good way to tune grainsize
before running full MCMC
?
bumping this. @mitzimorris @WardBrian Do either of you know if this works with cmdstanpy optimize? Do I need to set this globally?
Setting the environment variable should work. We donât expose anything for optimization that sets it
Okay I will try this. If I remember correctly, this didnât work (setting the environment variable).
I will check again. I can also review the PR to cmdstanr
to see how you called the cmdstan
optimize_args
API (I think the argument is threads
?). and see if I canât do something similar with a local copy of cmdstanpy
Confirmed that this âworksâ (in that setting show_console=True
in the optimize
call for cmdstanpy
leads to a line of
num_threads = 8 (Default)
where I set
os.environ['STAN_NUM_THREADS'] = str(8)