In a simulation study, I tried to compile the same code 10 times in parallel.
The following error occurred from each of these 10 compilations:
Error in process_initialize(self, private, command, args, stdin, stdout, …
:
! Native call to processx_exec
failed
Caused by error in chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …
:
! cannot start processx process ‘/local/users/dliang/Fall2022/BCSM/Analysis/Codes/BC2k10_ref_1’ (system error 26, Text file busy) @unix/processx.c:611 (processx_exec)
—
Backtrace:
-
- cmdstanr::cmdstan_model(“BC2k10_ref_1.stan”)*
-
- CmdStanModel$new(stan_file = stan_file, exe_file = exe_file, …*
-
- local initialize(…)*
-
- cmdstanr:::model_compile_info(self$exe_file())*
-
- withr::with_path(c(toolchain_PATH_env_var(), tbb_path()), ret ← wsl_compa…*
-
- base::force(code)*
-
- cmdstanr:::wsl_compatible_run(command = wsl_safe_path(exe_file), args = “info”, …*
-
- base::do.call(processx::run, run_args)*
-
- (function (command = NULL, args = character(), error_on_status = TRUE, …*
10. process$new(command, args, echo_cmd = echo_cmd, wd = wd, windows_verbatim_…
11. local initialize(…)
12. processx:::process_initialize(self, private, command, args, stdin, stdout, …
13. processx:::chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …
14. | base::withCallingHandlers(do.call(“.Call”, list(.NAME, …)), error = function(e…
15. | base::do.call(“.Call”, list(.NAME, …))
16. | base::.handleSimpleError(function (e) …
17. | local h(simpleError(msg, call))
18. | processx:::throw_error(err, parent = e)
Execution halted
The code is very basic, just run in parallel.
mod_ <- cmdstan_model("BC2k10_ref_1.stan")
- Operating System: Ubuntu
- CmdStan Version: 2.30.1
- Compiler/Toolkit: D.K.
Hello,
A workaround of this problem is manually coping the stan code to a unique name.
## batch_ : batch processing id
file.copy("BC2k10_ref_1.stan",paste0("BC2k10_ref_1",batch_,".stan"))
mod_ <- cmdstan_model(paste0("BC2k10_ref_1",batch_,".stan"))
file.remove(paste0("BC2k10_ref_1",batch_,".stan"))
But this generates many copies of the same compiled program under the working directory.
Thanks for following up with a hint, @Dong_Liang1. I have a few follow up questions if you don’t mind.
-
How you were running the cmdstan_model()
function call in parallel? Was it using some built-in R functionality or were you calling an R script in parallel from the shell?
-
If you want parallelism, can you just compile the model once and use it in all of the parallel calls?
-
Is there some reason the built-in parallelism in cmdstanr isn’t enough for what you need?
The parallel run was driven in a shell file named “sim”.
#!/bin/sh
echo "Base->" $1
echo "End->" $2
echo "Rep->" $3
for i in $(seq $1 $2);
do
echo batch $i;
R CMD BATCH "--vanilla --slave --args $i $3" BC2k10_ref_1.R sim1_b${i}.Rout &
sleep 1
done
Ten batches were then run, each with 99 replicates.
./sim 1 10 99
The R script “BC2k10_ref_1.R” looks like this.
args_ <- commandArgs(trailingOnly = T)
batch_ <- as.integer(args_[1])
rep_ <- as.integer(args_[2])
fn_ <- paste0("sim1_b",batch_,".rData")
cat("Batch",batch_,"R=",rep_,"saving to ",fn_,'\n')
## compile the stan code
file.copy("BC2k10_ref_1.stan",paste0("BC2k10_ref_1_",batch_,".stan"))
mod_ <- cmdstan_model(paste0("BC2k10_ref_1_",batch_,".stan"))
file.remove(paste0("BC2k10_ref_1_",batch_,".stan"))
I don’t know how to compile the program offline and reuse it in parallel calls, or how to use the built-in cmdstanr to run multiple data sets.
Thanks
Dong
To double-check, do you have a single Stan model that you’re attempting to pass multiple datasets to, or do you have multiple stan models and multiple datasets?
I am attempting to pass multiple datasets to the same stan model. It’s a fishery population dynamic model originally coded in admb.
Can you try performing the parallelism using the compiled model in purely R? For example:
# Compile stan model once
mod <- cmdstanr::cmdstan_model("BC2k10_ref_1.stan")
# Load all datasets as a singe list of datasets:
dataset_list <- list(...)
# Use furrr package for parallel evaluation
library(furrr)
plan(multisession)
parallel_results <- future_map(dataset_list, function(dataset) {
mod$sample(
data = dataset_list
)
})
Yes, I think that will avoid this “Text file busy” issue.