Saving & reusing adaptation in cmdstanr

@ahartikainen could you try with this data & model? I’m getting much more dramatic results with that set.

(Note: I did install cmdstanpy and tried to do it myself, but when I replace all instances of data=schools_data with data='data_for_stan.json', the warmup-alone bit runs fine but the sampling-alone yields an error complaining about the value to the data argument needing to be a string or dict, which is obviously already is. Note also that this error goes away if I comment-out supplying an init value! Another bug possibly??)

Yes, that is a bug.

Can you read json to dict?

import json
with open("path/to/file.json") as f:
    schools_data = json.load(f)

I will try later today.

I used this line to convert rds to json

write(toJSON(data, pretty = TRUE, auto_unbox = TRUE, digits=16), file="data_for_stan.json")

How did you transform that rds to json?

With Windows, I had first chain stuck at warmup. Now trying with Ubuntu.

I did run this on Windows. I need to test the code again with Ubuntu.

I think I can create a github repo and use github Workflows to test long runs against different OS.


Ok, looks like it fails with CmdStanPy too.

I will still check pystan.

edit. pystan had less divergences, this is weird


@mitzimorris could you elaborate on why this isn’t a bug? I don’t understand your abbreviated comment closing my bug report. If it were simply about the first sampling iteration, using the initial values, then what am I doing wrong in setting the initial values and wouldn’t you expect just the first few iterations to be divergent? I observe a very high proportion of samples going divergent in the second data & model example I provided.


first off, apologies for the too terse message. I tried to find a better explanation than what I offered. Perhaps the discussion here would be useful: Current state of checkpointing in Stan

this issue comes up alot and it is indeed something we need to figure out.

Hm, as I understand it checkpointing is about achieving capture/reinstatement of all rng states, which if achieved would permit yield identical samples between a run with warmup and sampling together and two runs, one with warmup and one with sampling. However, absent such checkpointing, I expected that capture/reinstatement of at least the inv_metric and step_size, plus use of the final warmup draw as inits, would at least place the sampling run in the typical set with HMC parameters that should yield not-identical-but-roughly-equivalent performance in sampling, as measured by the rate of encountering divergences, ESS, Rhat, etc. But what I observe, particularly with the second data/model I posted on the issue, is that while the combined warmup+sampling runs almost never encounter divergences, the split warmup-then-sampling runs nearly always do, and when they do they have divergences for the majority of samples.

Where is my understanding faulty?

the sampler takes the initial parameter values and tries to use them, but if for some reason
those inits fail to meet constraints, it will try to use other values, therefore it starts in a bad place.
as you don’t do any further adaptation, it continues to have problems.
perhaps @bbbales2 has more insight as to what’s going on in these examples.

This seems like a bug to me. I looked at the scripts earlier and I definitely think it should work and it seems to have worked a couple times earlier in this thread even (here and here). Something going on. Hopefully @mike-lawrence and @ahartikainen figure it out!

But the inits are coming from the final draw of the warmup, so aren’t those guaranteed to meet the model contraints?

if it sometimes works and sometimes doesn’t, then it sounds like it’s hit a difficult model/data combo. there’s a loss of precision when you dump out the draw and then read it back in again. parameter values very close to one are particularly problematic.

for the problematic example, has the chain properly converged during the warmup phase?

I’m using sig_figs=18 during the warmup run

There is a cholesky_factor_corr parameter that, when I look at the summary from a run with warmup-and-sampling done together, has a constant value of 1 in the [1,1] entry, but when I look at the inits and test if they are precisely equal to one, the init value for that entry is.

I think so? If it weren’t, I’d expect more issues with the runs where I do warmup and sampling together. Also, I still observe the same behaviour when I bump the warmup iterations up to 1e4.

Here’s a gist containing the stan file, the code to generate data and the code to loop over seeds, generating new data and attempting to sample in the two ways for each seed.

Data and model are a standard hierarchical model where there are 10 subjects each observed with common gaussian measurement noise 10 times in each of 2 conditions, and the subjects’ intercept and condition effect are modelled as multivariate normal.

I just ran it for 100 seeds and where only about 5% of traditional runs encounter divergences, 96% of of the two-stage runs had them, and lots of them:


In R I used cmdstanr::write_stan_json()

Tried this and I get the same error (and just updated the cmdstanpy bug report to reflect this).

@ahartikainen you can ignore at least the part of this thread where I had trouble using cmdstanpy; turned out I was still using an old version and the latest version runs my examples fine. Starting a loop to confirm that the core issue of increased divergences manifests in cmdstanpy as it does with cmdstanr…

Yup, same behaviour in cmdstanpy as cmdstanr, so seems to be something core to cmdstan itself.

Ooof, this is very compelling. Definitely want to figure out what is happening here before the 2.26 release.

1 Like

Any insights so far? I’m happy to help explore but would need a little guidance on what to try.

Ooo, thanks for the reminder. I’ve got a window while some stuff builds. Will go check it out now.

1 Like