Same model, same data, same seed, different computers, different number of divergences

A model is being run on the same dataset on two different computers (my mac laptop vs the unix cluster where I work, R4.3.1 the former, R4.2.2 the latter, I am using cmdstanr). I use the same seed but I am getting a different number of divergences and slightly different results. Just noticed this as I was trying to understand the reason for the divergences and wanted to reproduce the result on my laptop so I can graph things. Do different number of divergences for the same data with the same seed make any sense or is this a red flag?

Of interest may be the reproducibility section in the Stan reference manual. In particular:

Stan is designed to allow full reproducibility. However, this is only possible up to the external constraints imposed by floating point arithmetic.

Stan results will only be exactly reproducible if all of the following components are identical:

Stan version
Stan interface (RStan, PyStan, CmdStan) and version, plus version of interface language (R, Python, shell)
versions of included libraries (Boost and Eigen)
operating system version
computer hardware including CPU, motherboard and memory
C++ compiler, including version, compiler flags, and linked libraries
same configuration of call to Stan, including random seed, chain ID, initialization and data

That should, at least partially, explain the discrepancies between the outputs. As to mitigating this and/or whether the number of divergences is meaningfully different, I can’t say.

3 Likes

Thank you, this does explain the differences in the number of divergences. Thanks for pointing to the reproducibility section.

1 Like