RDS issues in Rstan with parallel session


#1

Just wanted to report an issue I encountered while working with RStan on a Linux cluster.
Running a “job array” where several jobs are using the same Stan model from a file
I got many jobs to terminate with “readRDS” errors, like

“Error in readRDS(file) : error reading from connection”

the problem went away once I turned off the
rstan_options(auto_write = TRUE)

seems like multiple processes were trying to read and write into the same “model.rds”


#2

The better way to do something like this is to

  1. Compile the model once on the log-in node with auto_write = TRUE
  2. Do the sampling on the cluster nodes

This will ensure that all of the cluster nodes are only reading but not writing to the RDS file.


#3

Thank you
how do I load on a cluster node from pre-compiled model in RDS file to sample from it ?


#4

As long as you give the full path to the Stan program when you call stan or sampling on the cluster nodes, it should find the RDS file in the same directory. This can be a bit tricky if the file systems are separate.


#5

Personally, I prefer not bothering with auto_write and just explicitly compiling the model in advance. Here’s a script that I have to do that:

#!/usr/bin/env Rscript

library(tools)
library(optparse)
library(rstan)

option_list = list(
  make_option(c('--outputRDS', '-o'),
              help = 'Alternate name of output file'),
  make_option(c('--verbose', '-v'),
              help = 'Print intermediate compilation output',
              action = 'store_true',
              default = FALSE)
)

opts_args = parse_args(OptionParser(option_list=option_list,
                                    usage = "%prog [options] Stan-model-file"),
                       positional_arguments=1)

opts = opts_args$options
stanFile = opts_args$args

outputFile = NULL
if (is.null(opts$outputRDS)) {
   outputFile = sprintf("%s.rds", file_path_sans_ext(basename(stanFile)))
} else {
  outputFile = opts$outputRDS
}

myModel = stan_model(stanFile, verbose=opts$verbose, auto_write = FALSE, save_dso=TRUE)
saveRDS(myModel, file=outputFile)

Of course, you’ll need to install the optparse package from CRAN for this script to work as is. On the cluster node, just read in the saved model with the readRDS() function.


#6

Thank you this is very helpful but I’d also appreciate having the last bit
because reading with “save” function does not make much sense to me


#7

It doesn’t make sense because I wrote it incorrectly. Sorry about that. I fixed my post accordingly.


#8

I am new to RStan. I installed the R version (3.4.2) last week and previously had installed R3.2.2 in 2015 for Windows which is also currently in my C directory. I followed the rstan installation guideline https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows and did the testing for rTools. I got fx( 2L, 5 ) # should be 10.

I guess the installation is okay? I have also installed R-studio for windows. Then I ran the 8SchoolsExample in https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started#how-to-use-rstan

The details are below. I get the following message Error in readRDS(file.rds) : unknown input format

I appreciate if you can help me.
Hemantha


#9

Please ignore my post. I figured it out.