Saving state and restarting sampling on cluster computer with strict walltimes


#1

I’m running lots of rstan jobs on a linux cluster. As with most HPC environments, the administrators prefer short jobs, i.e. with walltimes < 12hr, and rarely allow longer walltimes. If you have a potentially long job, the recommended strategy is often to break it into a series of smaller jobs, with each continuing from the previous endpoint. Is there anyway to in rstan to save the state of the sampler intermittently, and restart sampling of a previously saved job? Sorry if i missed this somewhere in the manual & docs – I couldn’t find anything.

BTW - thanks for wonderful work with stan and rstan – it’s great!

Operating System: CentOS release 6.3 (Final)
Interface Version: rstan 2.16.2
Output of writeLines(readLines(file.path(Sys.getenv(“HOME”), “.R/Makevars”))):
CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function
CXX=clang++ -ftemplate-depth-256
CC=clang

Output of devtools::session_info("rstan”):

devtools::session_info(“rstan”)
Session info ------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz
date 2017-09-20

Packages ----------------------------------------------------------------------
package * version date source
BH 1.62.0-1 2016-11-19 CRAN (R 3.3.1)
colorspace 1.2-7 2016-10-11 CRAN (R 3.3.1)
dichromat 2.0-0 2013-01-24 CRAN (R 3.3.1)
digest 0.6.12 2017-01-27 CRAN (R 3.3.1)
ggplot2 2.1.0 2016-03-01 CRAN (R 3.3.1)
graphics * 3.3.1 2016-10-24 local
grDevices * 3.3.1 2016-10-24 local
grid 3.3.1 2016-10-24 local
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.1)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.1)
inline 0.3.14 2015-04-13 CRAN (R 3.3.1)
labeling 0.3 2014-08-23 CRAN (R 3.3.1)
lattice 0.20-33 2015-07-14 CRAN (R 3.3.1)
magrittr 1.5 2014-11-22 CRAN (R 3.3.1)
MASS 7.3-45 2016-04-21 CRAN (R 3.3.1)
Matrix 1.2-6 2016-05-02 CRAN (R 3.3.1)
methods * 3.3.1 2016-10-24 local
munsell 0.4.3 2016-02-13 CRAN (R 3.3.1)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.1)
Rcpp 0.12.7 2016-09-05 CRAN (R 3.3.1)
RcppEigen 0.3.3.3.0 2017-05-01 CRAN (R 3.3.1)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.1)
rstan 2.16.2 2017-07-03 CRAN (R 3.3.1)
scales 0.4.0 2016-02-26 CRAN (R 3.3.1)
StanHeaders 2.16.0-1 2017-07-03 CRAN (R 3.3.1)
stats * 3.3.1 2016-10-24 local
stats4 3.3.1 2016-10-24 local
stringi 1.1.2 2016-10-01 CRAN (R 3.3.1)
stringr 1.2.0 2017-02-18 CRAN (R 3.3.1)
tools 3.3.1 2016-10-24 local
utils * 3.3.1 2016-10-24 local


#2

It isn’t possible in Stan yet, but I know the developers are working on it. In the mean time, you might look into a more generic process checkpointing solution like:


#3

No


#4

Thanks aaronjig interesting idea. Have you tried this with rstan?


#5

Not yet, but it should be available soon.


#6

Thanks everyone for the response. Glad to hear that this feature is in dev!