Different Stan Versions Show Different Diagnostic Warnings

Hello Stan community, can someone help me understand the possible reasons for the different results described below please?

I have cmdstanpy installed on two different laptops. I copied the data and code for a model from one machine to the other machine, ran it, and the diagnostic check on the second machine shows split r-hat warnings. However no diagnostic problems were detected when running the model on the first machine. The training data and Python/Stan code are exactly the same. The machines have a different host OS but I run Stan inside Docker and my Dockerfile is the same. The Dockerfile does allow for the Python version to be different, and calling

model.exe_info()

indicates that the Stan versions are different. The first machine has Python 3.12.7 and Stan 2.35 while the second machine has Python 3.13.11 and Stan 2.37.

I searched the recent release notes on your blog (https://blog.mc-stan.org/) and didn’t see any mention of a change that sounds like it would be related to what I’m seeing. I’m also looking through your suggestions for resolving convergence problems ( How to Diagnose and Resolve Convergence Problems – Stan ), in case this is a model specification issue. Eventually this model will be hosted on an AWS EC2 instance so I also wanted to check with the community to see if this might be related to software versions.

I realize that I would need to build the exact same Docker image on both machines to do an accurate comparison. However the model takes several hours to fit so in the mean time I was curious if anyone thinks the different diagnostic results might be related to the different Stan versions or not? Or, are the too many unknowns in my description to allow you to make a good guess as to the reason? Thanks in advance for any guidance you can share.

Sorry I think I spoke too soon. After posting the above I found the documentation on Reproducibility and it indicates that in my situation I should not expect the exact same results.

Stan 2.36 was the first version to include split, rank-normalized R-hat. So even if your fits were exactly the same, the diagnostic being run is itself different on the newer version

2 Likes