Saving & reusing adaptation in cmdstanr

mike-lawrence · November 9, 2020, 2:45pm

It came up in another thread that it should be possible to save the adaptation info from one model fitting session for use in subsequent sessions. Has anyone already worked out the code to do this with cmdstanr?

ahartikainen · November 9, 2020, 4:33pm

Can someone translate these functions?

It should be similar with cmdstanr.

mike-lawrence · November 9, 2020, 4:41pm

Thanks for linking to that code! If no one else chimes in to say they’ve already done it, I can take a crack at it this evening.

rok_cesnovar · November 9, 2020, 5:21pm

Thanks Mike!! Sorry I have been slow with reviewing. Its been a busy 10 days. Should normalize in a few days.

jonah · November 9, 2020, 10:24pm

@mike-lawrence I’ve wanted to try this but I haven’t yet, so I would be very interested to know how it goes if you try it out.

mike-lawrence · January 2, 2021, 4:31pm

Belatedly looking at this now. I thought it was as simple as:

iter_both = 1e3
parallel_chains = 4

#first a typical warmup_and_sampling run:
both = mod$sample(
	data = data_for_stan
	, chains = parallel_chains
	, parallel_chains = parallel_chains
	, refresh = iter_both/10*2 # update every 10%
	
	, seed = 1
	
	, iter_warmup = iter_both
	, save_warmup = T #for inits
	
	, iter_sampling = iter_both
)

#now just warmup
warmup = mod$sample(
	data = data_for_stan
	, chains = parallel_chains
	, parallel_chains = parallel_chains
	, refresh = iter_both/10 # update every 10%

	, seed = 1
	
	, iter_warmup = iter_both
	, save_warmup = T #for inits later
	
	, iter_sampling = 0
)

#and sampling using the prior warmup:
samples = mod$sample(
	data = data_for_stan
	, chains = parallel_chains
	, parallel_chains = parallel_chains
	, refresh = iter_both/10 # update every 10%
	
	, seed = 1

	, iter_warmup = 0
	, adapt_engaged = FALSE
	, inv_metric = warmup$inv_metric(matrix=F)
	, step_size = warmup$metadata()$step_size_adaptation

	, iter_sampling = iter_both
	, init = function(chain_id){
		warmup_draws = warmup$draws(inc_warmup=T)
		final_warmup_value = warmup_draws[iter_both,chain_id,]
		init_list = as.list(final_warmup_value)
		names(init_list) = dimnames(final_warmup_value)[[3]]
		return(init_list)
	}
)

But my sampling ends up with a ton of divergences when the same data+model done with warmup & sampling all in one go has none. I’ve tried with a variety of seeds in all calls to sample() and observe that the “sampling-only” run tends to have substantially higher probability of post-warmup divergences, and a substantially higher proportion of these when present, than the combined warmup+sampling runs.

@rok_cesnovar @jonah can either of you point out what I did wrong?

bbbales2 · January 2, 2021, 5:40pm

Hmm I’m not sure. If you have the inverse metric, the stepsize, and the inits I would expect this to run.

Do the divergences go away if you lower the step size? Like do:

step_size = 0.5 * warmup$metadata()$step_size_adaptation

If you can get something to run without divergences, check and make sure the ESS of all the parameters are comparable between this and the baseline calculation – if the metric or something is messed up these ESS numbers can be way different even if the divergences have gone away.

And if you don’t set the seed in the re-initialized sampling thing – do you always get a bunch of divergences in the output?

mike-lawrence · January 2, 2021, 5:47pm

Neither halving the stepsize nor letting the seed vary from run to run causes the rate or proportion of divergences to decrease.

Lemme see if I can get one of the cmdstanr examples to do the same thing so I don’t have to send my complicated example where I’m seeing this.

bbbales2 · January 2, 2021, 5:48pm

Does the inverse metric from the full calculation look similar to the inverse metric from just the warmup?

mike-lawrence · January 2, 2021, 5:50pm

If started with the same seed they’re identical

mike-lawrence · January 2, 2021, 5:59pm

Ok, I’m getting it with the 8-schools ncp example:

library(cmdstanr)

example_program = 'schools_ncp.stan'
example_data = 'schools_ncp.data.json'
tmp <- file.path(tempdir(), example_program)
if (!file.exists(tmp)) {
  file.copy(system.file(example_program, package = "cmdstanr"), tmp)
}
mod <- cmdstan_model(tmp)
data_file <- system.file(example_data, package = "cmdstanr")


iter_both = 1e3
parallel_chains = 4

both = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = iter_both/10*2 # update every 10%
  
  , seed = 1
  
  , iter_warmup = iter_both
  , save_warmup = T #for inits
  
  , iter_sampling = iter_both
)


warmup = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = iter_both/10 # update every 10%
  
  , seed = 1
  
  , iter_warmup = iter_both
  , save_warmup = T #for inits
  , sig_figs = 18
  
  , iter_sampling = 0
)

#checking that all the step_sizes are identical
identical(
  warmup$metadata()$step_size_adaptation
  , both$metadata()$step_size_adaptation
)

#checking that all the inv_metrics are identical
purrr::map2(
  .x = warmup$inv_metric(matrix=F)
  , .y = both$inv_metric(matrix=F)
  , .f = identical
)


samples = mod$sample(
  data = data_file
  , chains = parallel_chains
  , parallel_chains = parallel_chains
  , refresh = iter_both/10 # update every 10%
  
  # , seed = 1
  
  , iter_warmup = 0
  , adapt_engaged = FALSE
  , inv_metric = warmup$inv_metric(matrix=F)
  , step_size = warmup$metadata()$step_size_adaptation
  
  , iter_sampling = iter_both
  , init = function(chain_id){
    warmup_draws = warmup$draws(inc_warmup=T)
    final_warmup_value = warmup_draws[iter_both,chain_id,]
    init_list = as.list(final_warmup_value)
    names(init_list) = dimnames(final_warmup_value)[[3]]
    init_list = init_list[names(init_list)!='lp__']
    return(init_list)
  }
)

bbbales2 · January 2, 2021, 6:04pm

This is sounding like a bug in Stan somewhere.

I’m wondering if Stan is not actually using the inverse metric it’s giving you for sampling.

Are the estimates way off in the thing with divergences?

Can you do the initialized fit with these options:

  iter_warmup = 50,
  adapt_engaged = TRUE,
  init_buffer = 0,
  term_buffer = 50,
  window = 0

That will do 50 samples at the end to find a good stepsize but use the provided inverse metric and initialization.

mike-lawrence · January 2, 2021, 6:08pm

Doesn’t seem to help. Give me a sec and I’ll post a version that puts everything in a loop over a bunch of seeds and spits out the divergence rates (so I’m not relying on my limited sampling perceptions).

bbbales2 · January 2, 2021, 6:14pm

Hmm, keep poking and prodding at this. Work under the assumption there might be a bug though. THis is very weird.

bbbales2 · January 2, 2021, 6:14pm

Check the inits and make sure there’s nothing weird happening with them.

mike-lawrence · January 2, 2021, 7:03pm

Inits seem fine, and I get the same behaviour if I use samples from the sampling period of the combined warmup+sampling run as the inits for the sampling-only run.

Here’s code to loop over lots of seeds (which indeed confirms my observations that divergences are more likely for the sampling-only runs):

library(tidyverse)
library(progress)
library(cmdstanr)

example_program = 'schools_ncp.stan'
example_data = 'schools_ncp.data.json'
tmp <- file.path(tempdir(), example_program)
if (!file.exists(tmp)) {
  file.copy(system.file(example_program, package = "cmdstanr"), tmp)
}
mod <- cmdstan_model(tmp)
data_file <- system.file(example_data, package = "cmdstanr")

get_num_diverged = function(x){
  temp = x$cmdstan_diagnose()$stdout
  temp = strsplit(temp,'Checking sampler transitions for divergences.\n')[[1]][2]
  temp = strsplit(temp,' ')[[1]][1]
  if(temp=='No'){
    return(0)
  }else{
    return(as.numeric(temp))
  }
}
  

run_seed = function(seed){
  sink('/dev/null') #only way to keep things quiet

  iter_both = 1e3
  parallel_chains = 4
  
  both = mod$sample(
    data = data_file
    , chains = parallel_chains
    , parallel_chains = parallel_chains
    , refresh = 0
    , show_messages = F
    , seed = seed
    
    , iter_warmup = iter_both

      , iter_sampling = iter_both
  )
  both_num_diverged = get_num_diverged(both)  
  
  warmup = mod$sample(
    data = data_file
    , chains = parallel_chains
    , parallel_chains = parallel_chains
    , refresh = 0
    , show_messages = F
    
    , seed = seed
    
    , iter_warmup = iter_both
    , save_warmup = T #for inits
    , sig_figs = 18
    
    , iter_sampling = 0
  )

  samples = mod$sample(
    data = data_file
    , chains = parallel_chains
    , parallel_chains = parallel_chains
    , refresh = 0
    , show_messages = F
    
    , seed = seed
    
    , iter_warmup = 0
    , adapt_engaged = FALSE
    , inv_metric = warmup$inv_metric(matrix=F)
    , step_size = warmup$metadata()$step_size_adaptation
    
    , iter_sampling = iter_both
    , init = function(chain_id){
      warmup_draws = warmup$draws(inc_warmup=T)
      final_warmup_value = warmup_draws[iter_both,chain_id,]
      init_list = as.list(final_warmup_value)
      names(init_list) = dimnames(final_warmup_value)[[3]]
      init_list = init_list[names(init_list)!='lp__']
      return(init_list)
    }
  )
  samples_num_diverged = get_num_diverged(samples)  
  
  out = tibble::tibble(
    #checking that all the step_sizes are identical
    identical_step_size = identical(
      warmup$metadata()$step_size_adaptation
      , both$metadata()$step_size_adaptation
    )
    
    #checking that all the inv_metrics are identical
    , identical_inv_metric = all(purrr::map2_lgl(
      .x = warmup$inv_metric(matrix=F)
      , .y = both$inv_metric(matrix=F)
      , .f = identical
    ))
    
    , both_num_diverged = both_num_diverged
    
    , samples_num_diverged = samples_num_diverged
    
  )
  sink(NULL)
  pb$tick()
  return(out)

}

orig_warn = options(warn=-1)
num_seeds = 100
pb = progress::progress_bar$new(
  total = num_seeds
  , format = "[:bar] :percent eta: :eta",
)
out = purrr::map_dfr(1:num_seeds,run_seed)
options(warn=orig_warn)

summary(out)

(
  out
  %>% tidyr::pivot_longer(
    cols = contains('num_diverged')
  )
  %>% dplyr::group_by(
    name
  )
  %>% dplyr::summarise(
    any_diverged = mean(value>0)
    , mean_num_diverged_when_any = mean(value[value>0])
  )
)


(
  out
  %>% tidyr::pivot_longer(
    cols = contains('num_diverged')
  )
  %>% ggplot()
  + geom_histogram(
    aes(
      x = value
      , fill = name
    )
    , alpha = .5
    , position = 'identity'
  )
)

mike-lawrence · January 2, 2021, 7:22pm

I’ve exhausted my expertise in examining this, so I filed a bug report at the cmdstanr repo (in case it’s something internal to cmdstanr rather than cmdstan).

ahartikainen · January 2, 2021, 7:47pm

Please check that all the csv files use the correct stepsize/inv_metric in sampling phase.
There were some problems with this when I implemented the fixes for CmdStanPy.

These work in Python side.

import arviz as az
import pystan


schools_code = """
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""

model = pystan.StanModel(model_code=schools_code)


schools_data = {"J": 8,
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}


fit_warmup = model.sampling(
    iter=20001, 
    warmup=20000, 
    data=schools_data,
    control={
        "adapt_engaged": True,
        "adapt_delta": 0.99,
    },
)

inv_metric = fit_warmup.get_inv_metric(as_dict=True)
init = fit_warmup.get_last_position(warmup=True)
stepsize = fit_warmup.get_stepsize()


fit_samples = model.sampling(
    data=schools_data,
    iter=20000, 
    warmup=0, 
    control={
        "adapt_engaged": False,
        "inv_metric": inv_metric,
        "stepsize": stepsize,
        "adapt_delta": 0.99,
    },
    init=init
)

fit = model.sampling(
    iter=40000, 
    warmup=20000, 
    data=schools_data,
    control={
        "adapt_engaged": True,
        "adapt_delta": 0.99,
    },
)

print(az.summary(fit))
print(az.summary(fit_samples))

and for CmdStanPy

import json

import arviz as az
import cmdstanpy


def get_inits(fit, checkpoint_name="init_checkpoint"):
    params_bool = [not item.endswith("__") for item in fit.column_names]
    params = [item for item in fit.column_names if not item.endswith("__")]
    names = []
    for i, init in enumerate([item[params_bool].tolist() for item in fit.draws(inc_warmup=True)[-1]], 1):
        names.append(checkpoint_name + f"_chain_{i}" + ".json")
        with open(names[-1], "w") as f:
            json.dump(dict(zip(params, init)), f)
    return names


def get_stepsize(fit):
    return fit.stepsize.tolist()


def get_metric(fit, checkpoint_name="metric_checkpoint"):
    names = []
    for i, metric in enumerate([item.tolist() for item in fit.metric], 1):
        names.append(checkpoint_name + f"_chain_{i}" + ".json")
        with open(names[-1], "w") as f:
            json.dump({"inv_metric": metric}, f)
    return names


schools_code = """
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""


with open("schools_code.stan", "w") as f:
    print(schools_code, file=f)

model = cmdstanpy.CmdStanModel(stan_file="schools_code.stan")

schools_data = {"J": 8,
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}

fit_warmup = model.sample(
    iter_warmup=20000,
    iter_sampling=0,
    data=schools_data,
    adapt_delta=0.99,
    save_warmup=True,
)


inits = get_inits(fit_warmup, checkpoint_name="init_checkpoint_round_0")
stepsize = get_stepsize(fit_warmup)
metric = get_metric(fit_warmup, checkpoint_name="metric_checkpoint_round_0")


fit_samples = model.sample(
    data=schools_data,
    inits=inits,
    step_size=stepsize,
    metric=metric,
    iter_warmup=0,
    adapt_engaged=False,
    iter_sampling=20000,
)


fit = model.sample(
    data=schools_data,
    inits=inits,
    step_size=stepsize,
    metric=metric,
    iter_warmup=20000,
    adapt_engaged=True,
    adapt_delta=0.99,
    iter_sampling=20000,
)

print(az.summary(fit))
print(az.summary(fit_samples))

mike-lawrence · January 2, 2021, 8:48pm

Good call. The step_size and inv_metric show up accurately in the csvs, but the init values stored in the init json aren’t the ones I’m expecting given the function supplied to the init argument, so something awry there.

mike-lawrence · January 2, 2021, 8:56pm

oh, no, my mistake, the inits jsons are the right values too

Topic		Replies	Views
Checkpointing with CmdStanPy General	2	541	September 24, 2020
Passing a different mass matrix for each chain in cmdstanr Interfaces	7	704	January 13, 2021
Chkptstanr v0.2.0-alpha: checkpoint brms and cmdstanr sampling Publicity techniques , cmdstanr , brms	6	329	February 29, 2024
Saved mass matrix General	22	2690	December 12, 2022
Which Stan interface supports checkpointing? General cmdstan , rstan , stan , cmdstanr	6	889	May 5, 2021

Saving & reusing adaptation in cmdstanr

Related topics