Bin/diagnose in cmdstan to skip warmup

nerpa · November 19, 2020, 10:52am

Hello all,

I am running my models with save_warmup=True flag so the output files contain also the warmup draws. Are these draws taken into account when I run bin/diagnose on the output file? If yes, is there an easy way to skip them without editing my output file?

Thank you!

nerpa · November 19, 2020, 11:56am

Ok I just saw that it does include warmup. Is there an easy way to skip these draws?

bbbales2 · November 19, 2020, 12:47pm

I gave this a test with cmdstan and I don’t think it is by default looking at warmup. Which interface are you using?

Here is the csv file I tested: output2.csv (93.3 KB) . It is one chain of the Bernoulli example included in cmdstan. There are divergences in warmup, but none in sampling, and diagnose reports no divergences.

nerpa · November 19, 2020, 1:12pm

Thanks for checking! I am using cmdstanpy (the latest release) and my model has 600 warmup and 1000 sampling iterations with thinning=2. So my bin/diagnose output looks like this:

155 of 800 (19%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
45 of 800 (5.6%) transitions ended with a divergence.

It counts out of total 800 transitions [(600+1000)/2] so it includes the warmup as well. I cannot attach my output.csv file because it’s (unfortunately) over 2gb.

Thank you!

mitzimorris · November 19, 2020, 2:36pm

CmdStan’s bin/diagnose program uses all the draws in the Stan CSV file.
if you run the sampler with save_warmup=True, then the CSV file will include the warmup draws.

in CmdStanPy, the default value of save_warmup is False.

because there are so many different use cases for saving the draws
and running the diagnostics, there’s no one right thing to do.
that said, we could probably add logic to CmdStan’s diagnose method
so that it would skip warmup draws, if present.

nerpa · November 19, 2020, 2:46pm

Ok, thanks! So I see no solution at this point but to alter my output.csv file. Thank you both!

bbbales2 · November 20, 2020, 7:07pm

I think there is actually a bug. What is happening is that when diagnose reports divergences:

45 of 800 (5.6%) transitions ended with a divergence.

The number on the left is the number of post-warmup divergences and the number on the right is the total number of transitions counting warmup and divergences.

Here’s an attached file with 1 divergence (edit: 1 divergence after warmup, but there are bunch during warmup) output3.csv (93.8 KB) . THere are 1000 post warmup draws but it reports:

Checking sampler transitions for divergences.
1 of 2000 (0.05%) transitions ended with a divergence.

I made an issue over here: diagnose counting warmup draws · Issue #948 · stan-dev/cmdstan · GitHub

mitzimorris · November 21, 2020, 12:09am

confirming that there’s a bug; message x of N (%) transitions is misleading.
x is the number of post-warmup divergences. the diagnose method doesn’t have access to warmup transitions even if saved in CSV file. the fix is to change message so that N is the number of sampling iterations.

nerpa · November 27, 2020, 5:59pm

OK, thank you so much for looking into this!!

Topic		Replies	Views
Output divergences/max_treedepth hit during the run? CmdStan	14	956	February 25, 2022
CmdStan / CmdStanPy - if sampler run with `save_warmup=True` - include warmup draws by default? CmdStan	3	645	August 11, 2020
Using warm ups for posterior General	3	402	May 2, 2020
Sampling fails after warmup CmdStan	20	1003	January 25, 2023
Faster / better loading of sampler diagnostics in cmdstanr? Interfaces cmdstanr	4	525	November 3, 2020

Bin/diagnose in cmdstan to skip warmup

Related topics