Brms file does not exist after sampling complete

Hello,

I’m running brms with the cmdstanr backend on a research computer cluster, and the models take a while to run (~25 days). For some of these models, when the sampling finishes the execution is halted because the file cannot be found.

For example,

Compiling Stan program...
Start sampling
Running MCMC with 1 chain, with 24 thread(s) per chain...

Chain 1 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 1 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 1 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 1 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 1 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 1 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 1 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 1 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 1 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 1 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 1 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 1 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 1 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 1 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 1 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 1 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 1 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 1 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 1 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 1 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 1 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 1 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 1 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 1 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 1 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 1 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 1 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 1 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 1 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 1 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 1 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 1 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 1 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 1 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 1 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 1 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 1 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 1 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 1 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 1 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 1 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 1 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 1 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 1 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 1 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 1 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 1 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 1 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 1 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 1 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 1 finished in 2204150.0 seconds.
Error in read_cmdstan_csv(self$output_files(), variables = "", sampler_diagnostics = if (!fixed_param) c("treedepth__",  : 
  Assertion on 'files' failed: File does not exist: '/tmp/RtmpykA2YK/file1c9ba304aee56_threads-202202211255-1-49bed2.csv'.
Calls: brm ... read_cmdstan_csv -> <Anonymous> -> makeAssertion -> mstop
Execution halted

For some models, everything proceeds normally, and a file is saved after sampling finishes. For other models (maybe 2 out of 3), I get an error message similar to the one above. I am not able to fine a pattern as to why some models save normally and some do not.

I’d appreciate any advice/thoughts! When I run short models or models with few iterations they save just fine, and I have no issues.

This is just a guess: I’m wondering if the tmp directory is getting deleted/overwritten for some reason? Is there a way I can force R/brms/cmdstanr to use a permanent location over a tmp directory?

  • Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
  • brms Version: 2.14.4
  • cmdstanr Version: 0.3.0
  • cmdstan Version: 2.26.1

Thank you!
Peter

I would also assume the same that there is a problem with tmp directory. As I far as I know, it is possible to specify a particular directory and choose a “safer” location

Any recommendation for how I choose a safer/permanent location? I have not been able to find it in any documentation…

My guess is that I would change the R tmp directory in Renviron to a permanent location.

I currently use the file = of brms, but it doesn’t change where the model compiles.

Are you using some kind of parallel setup or anything else that might affect the rsession in a way that the tmp folder is cleared?
As a starting point, you could try updating brms and cmdstan(r) to the most recent cran or even dev versions.

Yes, I’m running the models in parallel. I don’t think that’s the issue (in isolation). I’ve run ~ a thousand model fittings simultaneously using brms for a power analyses, but I’ve never had this issue before. It’s likely a combination of running models in parallel and the models being long ones.

I’ll try updating the versions before I give it another shot.