So, not sure if this is just a me issue or if someone will be able to spot what’s going on. I am running a complex model that takes ~a day to sample, and it’s in active development so would like to be able to spot issues as it’s running. My objective is to be able to read from the sample file as the model is still sampling.
I am running R 3.6.0 on Ubuntu with RStudio Server on a remote box, with Stan version > rstan::stan_version() [1] "2.19.1"
I run the model with the following script:
run_mod <- function(dat, mod) {
library(rstan)
ret <- rstan::sampling(mod, dat, chains = 2, cores = 2, iter = 2000,
control = list(max_treedepth = 5),
sample_file = "stan_temp/samples",
diagnostic_file = "stan_temp/diagnostics",
verbose = T)
return(ret)
}
r_bg(run_mod, args = list(dat, mod))
The r_bg
call is from the callr
package. It runs a new R session in the background. I’ve also tried the same general thing using RStudio’s “jobs” feature, as well as with the ezStan
library. This will start a sampler in the background that will not block my current R session. This works as expected.
The issue is when I go to read the resulting CSV file. At the moment this is my script, since I have tried a few things to avoid the sudden-shutdown issue I’ve been having:
file.copy("stan_temp/samples_1.csv", "stan_temp/to_read.csv", overwrite = T)
c1_samples <- data.table::fread(cmd=paste("grep -v", "'#'",
"stan_temp/to_read.csv"),
colClasses = "numeric")
file.remove("stan_temp/to_read.csv")
the fread
call simply discards the commented lines and reads in the csv file. I copy the original file as when I was reading the original the same shutdown issue I describe below was happening. You can imagine doing the same thing without the copying and removing.
However, running this script sometimes works as expected, and sometimes just shuts down the sampler. I keep top running in a separate terminal and the R processes running the chains simply disappear. Since the R process is running in the background there is no information about what happens.
I am wondering if there is some kind of conflict where when I’m copying or reading from the csv that the sampler can’t write to it, and it causes some kind of error that I’m not seeing since it’s running in the background. This would be consistent with the error happening some times when I read from the csv, but not always.
If that is the case, is there any way to “safely” copy the file in such a way that will avoid this conflict? Thank you!