Sampling terminates when reading the sample file

So, not sure if this is just a me issue or if someone will be able to spot what’s going on. I am running a complex model that takes ~a day to sample, and it’s in active development so would like to be able to spot issues as it’s running. My objective is to be able to read from the sample file as the model is still sampling.

I am running R 3.6.0 on Ubuntu with RStudio Server on a remote box, with Stan version > rstan::stan_version() [1] "2.19.1"

I run the model with the following script:

run_mod <- function(dat, mod) {
  library(rstan)
  ret <- rstan::sampling(mod, dat, chains = 2, cores = 2, iter = 2000,
             control = list(max_treedepth = 5),
             sample_file = "stan_temp/samples",
             diagnostic_file = "stan_temp/diagnostics",
              verbose = T)
  
  return(ret)
}


r_bg(run_mod, args = list(dat, mod))

The r_bg call is from the callr package. It runs a new R session in the background. I’ve also tried the same general thing using RStudio’s “jobs” feature, as well as with the ezStan library. This will start a sampler in the background that will not block my current R session. This works as expected.

The issue is when I go to read the resulting CSV file. At the moment this is my script, since I have tried a few things to avoid the sudden-shutdown issue I’ve been having:

file.copy("stan_temp/samples_1.csv", "stan_temp/to_read.csv", overwrite = T)

c1_samples <- data.table::fread(cmd=paste("grep -v", "'#'", 
                                          "stan_temp/to_read.csv"),
                                colClasses = "numeric") 

file.remove("stan_temp/to_read.csv")

the fread call simply discards the commented lines and reads in the csv file. I copy the original file as when I was reading the original the same shutdown issue I describe below was happening. You can imagine doing the same thing without the copying and removing.

However, running this script sometimes works as expected, and sometimes just shuts down the sampler. I keep top running in a separate terminal and the R processes running the chains simply disappear. Since the R process is running in the background there is no information about what happens.

I am wondering if there is some kind of conflict where when I’m copying or reading from the csv that the sampler can’t write to it, and it causes some kind of error that I’m not seeing since it’s running in the background. This would be consistent with the error happening some times when I read from the csv, but not always.

If that is the case, is there any way to “safely” copy the file in such a way that will avoid this conflict? Thank you!

1 Like

Very strange. I run on Ubuntu all the time and ‘ezStan::watch_stan()’ even uses fread too if I recall correctly, and I’ve never encountered the sampler stopping thanks to reading the sample file. (Indeed, I’ve noticed the opposite with ezStan occasionally where failing to watch the samples leads to the sampling processes to immediately self-terminate for some reason). None of your operations open the file with the append/r+ mode, right?

Agree. Part of me thinks this is just a “me” problem but I do appreciate the help ¯_(ツ)_/¯

This, I don’t know. Basically the only operation that is currently touching the file in question is file.copy and I’m not sure what it’s doing under the hood. I would test with a vanilla cp in the terminal but I’m 50% of the way through sampling so want to let it finish overnight before testing

Okay, so I actually think this is the same thing happening to me as well. Before going to bed I mustered up the courage to cp a stan file, and it worked fine. The processes stayed up. Then a few minutes later as I was inspecting the samples, the processes died. Just now, I started a sampler in the background with callr::r_bg, and without reading the files or touching them, the sampler just died.

Any thoughts about why the sampler would “just die” in a background process?

1 Like

RStan 2.21 is on CRAN now and I’m not seeing this odd behaviour now that I’ve updated. Can you update and let me know what you observe?

1 Like

I have actually mostly moved over to cmdstan® so haven’t had this issue, but now that rstan is updated I may switch back. I will let you know if I have any of the same issues though!

2 Likes