Understanding the fit report header and finding warmups

This question is not about a specific model, therefore I’m not posting the model here. I’m confused about some of the details of the reports and results from Stan fits in general.

This is a report header from a recent Stan fit, which was set up to run 4 chains, iterations=50000, warmup=5000:
‘’’ Stan report
4 chains: each with iter=(50000,50000,50000,50000); warmup=(0,0,0,0); thin=(1,1,1,1); 200000 iterations saved.
Warmup took (1192, 1168, 1138, 1155) seconds, 1.3 hours total
Sampling took (10861, 8264, 10847, 6912) seconds, 10 hours total
‘’’
I have a few questions:

  1. The report says 200k iterations were saved, but I see only 50k iterations in the results. Are the results from different chains averaged, or something else going on?
  2. Why does the report say warmup=0? I passed on 5k warmups when I started the fit and Stan didn’t complain (i.e. proceeded to run and completed the fit). And Stan clearly spent about 10% of its time on the warmup, which suggests it carried out 5k warmups for 50k iterations as I asked.
  3. Are warmup iterations saved anywhere? Is it possible to see how the spin-up (this may not be the right phrasing, sorry) distributions look? This is important because I do not see a clear path to determining the proper number of warmups for a given model unless one can compare the warmup results to the “real” iterations.

Thank you

Hi, what interface do you use? And how do you call the sampling?

Hi,

I’m using CmdStan via Matlab. Just upgraded to CmdStan 2.20 from 2.17, but this had no impact on the report headers.

And this is the code that initiates Stan from Matlab:
dates = 1907:2015;
niter = 50000; % no. of MC simulations
nwarmup = 5000; % no. of warmups for STAN
Matm = size(dates,2); % # of elements in AHist
data = struct(‘Matm’,Matm,‘N13’,nfdat13,‘M13’,nYears13,‘fDat13’,…
SUM13.cGood,‘fDaterr13’,SUM13.eGood,‘fAgeD13’,fAgeD13,‘N06noaa’,…
nfdat06noaa,‘M06noaa’,nYears06noaa,‘fDat06noaa’,SUM06.cGoodnoaa,…
‘fDaterr06noaa’,SUM06.eGoodnoaa,‘fAgeD06noaa’,fAgeD06noaa,‘N06uci’,…
nfdat06uci,‘M06uci’,nYears06uci,‘fDat06uci’,SUM06.cGooduci,…
‘fDaterr06uci’,SUM06.eGooduci,‘fAgeD06uci’,fAgeD06uci,‘N15’,nfdatR15,…
‘M15’,nYearsR15,‘fDat15’,REN15.cGood,‘fDaterr15’,REN15.eGood,‘fAgeD15’,fAgeDR15);
mod = StanModel(‘file’,‘FirnOptimize_COS_SumRen_noAR1_AgeDerr.stan’); %pre-compile
mod.compile();
params = struct(‘data’,data,‘chains’,4,‘warmup’,nwarmup,‘iter’,niter);
fit = mod.sampling(params); %fit model
fit.block();
print(fit);

@avehtari do you know about the Matlab interface?

I inferring from your response that I should see 200k iterations in my results when I start 4 chains with iterations=50k. Am I correct? And if I use R (or another interface), this would be the case?

Where are the warmups saved, assuming the interface is not a problem?

I’ve stopped using Matlab some years ago, but IIRC Matlab interface doesn’t return warmup (I never needed it).

Yes. 50k per chain is probably 50 times more than what you need. Other samplers than Stan’s dynamic HMC may need 50k iterations, but Stan’s dynamic HMC is much more efficient, and I recommend to start with 1k iterations per chain.

Not saved when using Matlab interface.

Unfortunately MatlabStan is no longer actively developed, so it is well possible there is a bug there or something broke during upgrades to later versions of core Stan :-(

Thanks. I’m gonna have to look into switching to something else in the near future. Is R the most commonly used and recommended interface?

I need clarification on one more point if you can please. I seem to be getting results from only one chain regardless of how many chains I initiate, would you agree? Is there a disadvantage or a downside to starting only one chain then, aside from having to spend more time to get the same number of iterations. In my case, this would make no difference of course since I cannot see those iterations anyway.

Either R or python are well supported, both in code and in this forum.

Using the R (I guess the same applies to python) interface, you get draws from all the chains, which you can extract keeping the chain information intact, or all together.

Launching more chains allows you to better explore the posterior distribution, and thus discovering potentially problematic areas that a single chain may not reach. Besides that, it helps to produce diagnostics such as R-hat. In any case, if you have a few cores and enough memory, there’s no much disadvantage in running many chains (say 4) in parallel, so in particular during the development phase, that’s the safest approach.

1 Like

Is starting four consecutive single chain searches in series a viable workaround while I try to migrate my work to R or Python? I can combine the results externally after the runs are complete. Aside from loosing R_hat, which I get from my four chain runs (just cannot see the sims), and the obvious time expense, would that be the same as running four in parallel at the same time?

As long as each chain starts from a different point, then it will be fine. But note that from the output you showed above, it seems that it’s already running 4 chains, only that it doesn’t separate them out.

Yes, it does run four chains, but I am struggling to find the results. In Matlab, the returned “fit” file has only one chain saved. For example, if I start 4 chains with 5000 iterations, I get back 5000 simulation in all my posteriors and that’s it. I think I found a better workaround than running single chain searches in series though. At the end of each search, I also get back 4 output files with .csv extensions. I looked into them and they look like the outputs from 4 chains. I can write my own code in Matlab to read the data from the .csv files. If someone can confirm these .csv files are indeed the outputs form different chains, that would be great. Also, I’m not sure if these files are created from C++ or from Matlab after the run is over. I will feel more comfortable using them if they come from Stan directly. Thanks for all the input guys, I feel like I understand things much better now and realize some of the problems I’ve been seeing in repeatability of Stan searches probably had to do with the fact that I’ve been looking at results from a single chain the whole time.