Continue Warmup / Sampling after Interruption

Hello,

I want to run an IRT model on a relatively large dataset (8000 subjects, 130 questions). However, even after many experiments, it still takes 14 hrs for Rstan to run 1000 iterations on a subset of the data (1000 subjects, 80 questions). Then, it may take days to run the model on the whole dataset.

However, due to the time limit imposed by the computing cluster I run on, it is not feasible to let the model run for days. Hence, I wonder whether I can continue warmup/sampling after interruption. For example, is it possible to save the model as an R object after 1000 iterations and call the object later to continue warmup / sampling for another 1000 iterations?

Thank you very much!

This can be done with CmdStan, CmdStanPy (beta on github) and PyStan (current develop on github) interfaces.

Basically you need to run warmup n-iters and and then extract stepsize, inverse metric and location of last warm-up.

I added a small example how to continue sampling with PyStan (so you would need to edit that example to handle only warmup).

Ps. There is currently a small bug with PyStan, so it needs at least 1 sample iter

Thank you very much! I will take a look at PyStan!

There was a small bug with stepsize and adapt_engaged needs to be False. These are now fixed in PyStan master branch.

I created a small example which shows how to use the inv_metric option. This same can be done with CmdStan (e.g. using CmdStanPy).

That was very helpful. I work in R and can’t find where to pass the inverse metric to the new object. Any help on that front is much-appreciated.

Here’s some example code that gets most of the way there.

library(stringr)
library(ggplot2)
library(rstan)

# Functions to get the last samples and adaptation information
extract_final_sample <- function(samples){
  draws <- extract(samples, permuted = FALSE, inc_warmup = TRUE)
  num_draws <- dim(draws)[1]

  list(as.list(draws[num_draws, , ]))
}

create_adaptation_list <- function(samples){
  adapt_string <- get_adaptation_info(samples)[[1]]
  sample_stepsize <- str_extract(adapt_string, '(?<=Step size = )[0-9\\.]+')
  sample_invmetric <- str_extract(adapt_string, '(?<=Diagonal elements of inverse mass matrix:\n# )([0-9\\.]+, )+[0-9\\.]+')

  list(adapt_engaged = FALSE,
       stepsize = as.numeric(sample_stepsize))

  #list(adapt_engaged = FALSE,
  #    stepsize = as.numeric(sample_stepsize),
  #    inv_metric = as.numeric(unlist(str_split(sample_invmetric, ', '))))
}

# Create data and simple model
num_obs <- 100
model_list <- list(N = num_obs,
                   x = rnorm(num_obs))

model_obj <- stan_model(model_code = '
                        data{
                          int<lower = 0> N;
                          vector[N] x;
                        }
                        parameters{
                          real mu;
                          real<lower = 0> sigma;
                        }
                        model{
                        x ~ normal(mu, sigma);
                        }
                        ')

# Draw initial sample with warmup samples
samples <- sampling(model_obj, model_list, iter = 250, warmup = 200, chains = 1)

# Try to draw samples starting from previous values
samples_cont <- sampling(model_obj, model_list, chains = 1, iter = 250, warmup = 0,
                         init = extract_final_sample(samples),
                         control = create_adaptation_list(samples))

# Plot the draws in sequence
mu_samples <- c(extract(samples, par = 'mu', permuted = FALSE, inc_warmup = TRUE),
             extract(samples_cont, par = 'mu', permuted = FALSE, inc_warmup = TRUE))

sample_data <- data.frame(mu = mu_samples,
                             id = ordered(rep(1:2, each = 250)),
                             num = 1:500)

ggplot(sample_data, aes(num, mu, color = id)) +
  geom_line() +
  theme_bw()