Minimizing warmup iterations - Error reading step size from CmdStan output

I’m working on a model with a highly multimodal posterior and it seems like the geometry is too complex for Stan (namely, it warns me that it reaches the max treedepth at each iteration). For this reason, I use a Metropolis-Hastings algorithm which combines Stan’s dynamic HMC with heuristic MCMC moves which seem to help explore the posterior.

In that context, I need to sample with Stan frequently. Since the likelihood of the model is expensive to compute, I’m trying to minimize the number of warmup iterations. Hence, the first time I call Stan, I let it perform the full warmup. For the subsequent calls I would like to hint the initial step size and metric using the previous output and reduce the warmup’s length (under the assumption that the geometry of the posterior is not significantly different in different parts of the posterior). I do this in the following manner:

stan_output = model.sample(data=data, inits=inits, iter_sampling=sample_size, chains=1,
                iter_warmup=30, adapt_init_phase=0, adapt_metric_window=20, adapt_step_size=10, metric=previous_metric, step_size=previous_step_size)
previous_metric = [{"inv_metric": metric} for metric in stan_output.metric]
previous_step_size = list(map(float, stan_output.step_size))

When I try to do this however, I get the error:

Traceback (most recent call last):                                                                                                                             
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap                                                                               
    self.run()                                                                                                                                                 
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run                                                                                      
    self._target(*self._args, **self._kwargs)                                                                                                                  
  File "embed.py", line 161, in <lambda>                                                          
    run_parallel_chains(lambda chain, log_progress: sample_chain(                                                                                              
                                                    ^^^^^^^^^^^^^                                                                                              
  File "embedding/embedding.py", line 119, in sample_chain                                
    parameters, *args = hmc_sample(replace_known_parameters(current_parameters, known_parameters), hmc_sample_size, *args)                                     
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                     
  File "embed.py", line 121, in hmc_sample_s1                                                     
    stan_output = stan_model.sample(**sample_args)                                                                                                             
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "lib/python3.11/site-packages/cmdstanpy/model.py", line 1210, in sample                                           
    mcmc = CmdStanMCMC(runset)
           ^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/cmdstanpy/stanfit/mcmc.py", line 103, in __init__
    config = self._validate_csv_files() 
             ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "lib/python3.11/site-packages/cmdstanpy/stanfit/mcmc.py", line 297, in _validate_csv_files
    dzero = check_sampler_csv(
            ^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/cmdstanpy/utils/stancsv.py", line 44, in check_sampler_csv
    meta = scan_sampler_csv(path, is_fixed_param)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/cmdstanpy/utils/stancsv.py", line 103, in scan_sampler_csv
    raise ValueError("Error in reading csv file: " + path) from e
ValueError: Error in reading csv file: tmp_output/model-20230811083634.csv
Process Process-3:
Traceback (most recent call last):
  File "lib/python3.11/site-packages/cmdstanpy/utils/stancsv.py", line 100, in scan_sampler_csv
    lineno = scan_hmc_params(fd, dict, lineno)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/cmdstanpy/utils/stancsv.py", line 352, in scan_hmc_params
    raise ValueError(
ValueError: line 47: expecting metric, found:
         "# Step size = 0.00195086"

I tried to work out a MWE but I can’t reproduce the problem with a small simple model, so I share here the output csv file which CmdStanPy cannot parse properly.
model-20230811083634_0-stdout.txt (1.8 KB)

The purpose of this post is to address the error of CmdStanPy. Nevertheless, feel free to discuss/ask questions about the model and my usage of Stan.

Thank you for your help!

Operating System: Arch
Interface Version: cmdstanpy v1.1.0
Compiler/Toolkit: gcc version 13.2.1 20230801 (probably what Stan uses?)

That error is raised when the output files do not contain the line # Adaptation terminated.

I can’t seem to re-create the issue locally (I added the following lines to cmdstanpy_tutorial.py):

previous_step_size = list(map(float, fit.step_size))
previous_metric = [{"inv_metric": metric} for metric in fit.metric]
fit2 = model.sample(
    data=data,
    iter_warmup=30,
    adapt_init_phase=0,
    adapt_metric_window=20,
    adapt_step_size=10,
    metric=previous_metric,
    step_size=previous_step_size,
    output_dir='.'
)
print(fit2.summary())

Thank you for the answer. The issue is odd since I can’t reproduce it with a small model (as I mentionned in the original post).

It makes sense since in the output file, # Adaptation terminated is not a separated line. Now I see that I’ve shared the wrong output file. Here is the problematic output file in another run:
model-20230811103203.csv (8.6 KB)

Could it be a race condition since the end of the line 46 is theta.# Adaptatio# Adaptation terminated ? One thing I forgot to mention is that I handle parallel chains manually in Python with multiprocessing, meaning that two or more stan programs could be executed (almost?) simultaneously. However, I’m not sure how the race condition on the file would occur unless the naming convention CmdStanPy is not safe in this use case, in the sense that 2 progams could end up with the same output file name.

Very possibly, if you’re having multiple copies running in parallel and you don’t take care to make sure the file is not overlapping with others

I found the issue: it was in fact a race condition. Sometimes, two Stan processes write to the same file (probably when they start/end at the exact same time). The solution was to manually set the output file names with CmdStanpy such that different Stan processes always write different files. This can be done with the argument time_fmt in CmdStanModel.sample.