"Text file busy" Parallel

In a simulation study, I tried to compile the same code 10 times in parallel.

The following error occurred from each of these 10 compilations:

Error in process_initialize(self, private, command, args, stdin, stdout, …:
! Native call to processx_exec failed
Caused by error in chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …:
! cannot start processx process ‘/local/users/dliang/Fall2022/BCSM/Analysis/Codes/BC2k10_ref_1’ (system error 26, Text file busy) @unix/processx.c:611 (processx_exec)

Backtrace:

    1. cmdstanr::cmdstan_model(“BC2k10_ref_1.stan”)*
    1. CmdStanModel$new(stan_file = stan_file, exe_file = exe_file, …*
    1. local initialize(…)*
    1. cmdstanr:::model_compile_info(self$exe_file())*
    1. withr::with_path(c(toolchain_PATH_env_var(), tbb_path()), ret ← wsl_compa…*
    1. base::force(code)*
    1. cmdstanr:::wsl_compatible_run(command = wsl_safe_path(exe_file), args = “info”, …*
    1. base::do.call(processx::run, run_args)*
    1. (function (command = NULL, args = character(), error_on_status = TRUE, …*
      10. process$new(command, args, echo_cmd = echo_cmd, wd = wd, windows_verbatim_…
      11. local initialize(…)
      12. processx:::process_initialize(self, private, command, args, stdin, stdout, …
      13. processx:::chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …
      14. | base::withCallingHandlers(do.call(“.Call”, list(.NAME, …)), error = function(e…
      15. | base::do.call(“.Call”, list(.NAME, …))
      16. | base::.handleSimpleError(function (e) …
      17. | local h(simpleError(msg, call))
      18. | processx:::throw_error(err, parent = e)
      Execution halted

The code is very basic, just run in parallel.

mod_ <- cmdstan_model("BC2k10_ref_1.stan")
  • Operating System: Ubuntu
  • CmdStan Version: 2.30.1
  • Compiler/Toolkit: D.K.

Hello,
A workaround of this problem is manually coping the stan code to a unique name.

## batch_ : batch processing id 
file.copy("BC2k10_ref_1.stan",paste0("BC2k10_ref_1",batch_,".stan"))
mod_ <- cmdstan_model(paste0("BC2k10_ref_1",batch_,".stan"))
file.remove(paste0("BC2k10_ref_1",batch_,".stan"))

But this generates many copies of the same compiled program under the working directory.

Thanks for following up with a hint, @Dong_Liang1. I have a few follow up questions if you don’t mind.

  1. How you were running the cmdstan_model() function call in parallel? Was it using some built-in R functionality or were you calling an R script in parallel from the shell?

  2. If you want parallelism, can you just compile the model once and use it in all of the parallel calls?

  3. Is there some reason the built-in parallelism in cmdstanr isn’t enough for what you need?

The parallel run was driven in a shell file named “sim”.

#!/bin/sh
echo "Base->" $1
echo "End->" $2
echo "Rep->" $3
for i in $(seq $1 $2);
do
  echo batch $i;
  R CMD BATCH "--vanilla --slave --args $i $3" BC2k10_ref_1.R sim1_b${i}.Rout  &
  sleep 1
done

Ten batches were then run, each with 99 replicates.

./sim 1 10 99

The R script “BC2k10_ref_1.R” looks like this.

args_ <- commandArgs(trailingOnly = T)
batch_ <- as.integer(args_[1])
rep_ <- as.integer(args_[2])
fn_ <- paste0("sim1_b",batch_,".rData")
cat("Batch",batch_,"R=",rep_,"saving to ",fn_,'\n')

## compile the stan code
file.copy("BC2k10_ref_1.stan",paste0("BC2k10_ref_1_",batch_,".stan"))
mod_ <- cmdstan_model(paste0("BC2k10_ref_1_",batch_,".stan"))
file.remove(paste0("BC2k10_ref_1_",batch_,".stan"))

I don’t know how to compile the program offline and reuse it in parallel calls, or how to use the built-in cmdstanr to run multiple data sets.

Thanks
Dong

To double-check, do you have a single Stan model that you’re attempting to pass multiple datasets to, or do you have multiple stan models and multiple datasets?

I am attempting to pass multiple datasets to the same stan model. It’s a fishery population dynamic model originally coded in admb.

Can you try performing the parallelism using the compiled model in purely R? For example:

# Compile stan model once
mod <- cmdstanr::cmdstan_model("BC2k10_ref_1.stan")

# Load all datasets as a singe list of datasets:
dataset_list <- list(...)

# Use furrr package for parallel evaluation
library(furrr)
plan(multisession)

parallel_results <- future_map(dataset_list, function(dataset) {
  mod$sample(
    data = dataset_list
  )
})

Yes, I think that will avoid this “Text file busy” issue.