Reproducibility across machines - hardware/C++ complier dependent?

I work with the same data and Stan code using CmdStan 2.19.1 on two clusters. Even with the same random seed=3412, the sampling results are not exactly the same for the mean estimates and the number of divergences.

Q: Is it necessary to have completely identical hardware and C++ complier in order to exactly replicate the sampling results ?

Results on the faster cluster:

#=====================================================================
# In folder: 20190625.130259_6083422 [75,200,10,1,**3412**,600,200,0.98,11,dense_e,4,J90c]"
#===================================================================== 
4 chains: each with iter=(200,200,200,200); warmup=(600,600,600,600); thin=(1,1,1,1); 3200 iterations saved.
Warmup took (11852, 12628, 11884, 13546) seconds, 13.9 hours total
Sampling took (2372, 3629, 3075, 3553) seconds, 3.51 hours total
                        Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                1.87e+04  7.59e-01  1.33e+01   1.86e+04   1.87e+04   1.87e+04  3.08e+02  2.44e-02  1.01e+00
accept_stat__       9.74e-01  2.01e-03  5.47e-02   8.99e-01   9.90e-01   1.00e+00  7.42e+02  5.87e-02  9.99e-01
stepsize__          1.54e-02  1.38e-03  1.96e-03   1.32e-02   1.64e-02   1.81e-02  2.01e+00  1.59e-04  2.83e+14
treedepth__         9.63e+00  2.76e-01  4.87e-01   9.00e+00   1.00e+01   1.00e+01  3.11e+00  2.47e-04  1.68e+00
n_leapfrog__        8.87e+02  1.10e+02  2.47e+02   5.11e+02   1.02e+03   1.02e+03  5.07e+00  4.01e-04  1.31e+00
divergent__         2.50e-03      -nan  5.00e-02   0.00e+00   0.00e+00   0.00e+00      -nan      -nan  9.99e-01
energy__           -1.85e+04  1.10e+00  1.81e+01  -1.85e+04  -1.85e+04  -1.85e+04  2.72e+02  2.15e-02  1.02e+00
sd_y                8.01e-02  2.22e-05  6.05e-04   7.91e-02   8.00e-02   8.11e-02  7.40e+02  5.86e-02  9.99e-01
mu_u1               9.26e-02  2.98e-04  9.62e-03   7.68e-02   9.31e-02   1.09e-01  1.04e+03  8.24e-02  9.97e-01
mu_alpha            3.89e-02  7.01e-05  2.01e-03   3.57e-02   3.88e-02   4.23e-02  8.23e+02  6.52e-02  1.00e+00
beta                6.04e-01  2.72e-04  7.55e-03   5.91e-01   6.04e-01   6.17e-01  7.71e+02  6.11e-02  1.00e+00
theta               1.47e-01  1.16e-04  3.23e-03   1.42e-01   1.47e-01   1.52e-01  7.75e+02  6.13e-02  1.00e+00
sd_season           9.64e-02  1.41e-04  4.21e-03   8.93e-02   9.64e-02   1.03e-01  8.96e+02  7.09e-02  1.01e+00
mu_season[1]       -1.15e-01  3.02e-04  1.04e-02  -1.32e-01  -1.15e-01  -9.82e-02  1.18e+03  9.34e-02  9.96e-01
mu_season[2]       -6.00e-02  3.14e-04  1.05e-02  -7.72e-02  -5.97e-02  -4.28e-02  1.11e+03  8.83e-02  9.99e-01
mu_season[3]        1.39e-01  2.92e-04  1.03e-02   1.22e-01   1.39e-01   1.57e-01  1.24e+03  9.84e-02  1.00e+00
p[1]                6.63e-01  1.73e-03  4.72e-02   6.05e-01   6.51e-01   7.52e-01  7.42e+02  5.88e-02  1.00e+00
p[2]                6.21e-01  2.03e-04  5.94e-03   6.11e-01   6.21e-01   6.30e-01  8.55e+02  6.77e-02  9.99e-01
p[3]                6.28e-01  2.49e-03  6.78e-02   5.45e-01   6.10e-01   7.55e-01  7.41e+02  5.87e-02  1.00e+00
g[1]                8.41e-01  1.74e-03  4.87e-02   7.64e-01   8.41e-01   9.25e-01  7.84e+02  6.20e-02  9.98e-01
g[2]                3.50e-01  7.93e-04  2.27e-02   3.13e-01   3.50e-01   3.87e-01  8.22e+02  6.51e-02  1.00e+00
w[1]                7.71e-01  2.50e-03  6.83e-02   6.56e-01   7.73e-01   8.81e-01  7.46e+02  5.91e-02  9.99e-01
w[2]                1.42e-01  4.38e-04  1.25e-02   1.21e-01   1.42e-01   1.62e-01  8.10e+02  6.41e-02  9.98e-01
w[3]                6.36e-01  7.88e-04  2.24e-02   5.98e-01   6.36e-01   6.73e-01  8.11e+02  6.42e-02  1.00e+00
d[1]                6.83e-02  4.59e-04  1.15e-02   4.93e-02   6.84e-02   8.80e-02  6.23e+02  4.93e-02  1.01e+00
d[2]                6.93e-01  3.66e-04  1.03e-02   6.76e-01   6.93e-01   7.11e-01  7.85e+02  6.22e-02  9.97e-01
d[3]                2.39e-01  4.45e-04  1.15e-02   2.20e-01   2.39e-01   2.58e-01  6.70e+02  5.31e-02  9.99e-01
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
2 of 3200 (0.062%) transitions ended with a divergence.  These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.  Try rerunning with adapt delta set to a larger value and see if the divergences vanish.  If increasing adapt delta towards 1 does not remove the divergences then you will likely need to reparameterize your model.

Results on the slower cluster:

#========================================================================= 
# In folder: 20190625.193309_34194.solon [75,200,10,1,**3412**,600,200,0.98,11,dense_e,4,J90c]"
#=========================================================================
4 chains: each with iter=(200,200,200,200); warmup=(600,600,600,600); thin=(1,1,1,1); 3200 iterations saved.
Warmup took (20245, 21852, 23224, 24925) seconds, 25.1 hours total
Sampling took (3761, 6639, 6824, 6219) seconds, 6.51 hours total
                        Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                1.87e+04  7.03e-01  1.22e+01   1.86e+04   1.87e+04   1.87e+04  3.02e+02  1.29e-02  1.01e+00
accept_stat__       9.65e-01  1.37e-02  1.13e-01   8.96e-01   9.91e-01   1.00e+00  6.80e+01  2.90e-03  1.04e+00
stepsize__          1.53e-02      -nan  2.26e-03   1.34e-02   1.54e-02   1.89e-02      -nan      -nan  3.54e+14
treedepth__         9.65e+00  3.22e-01  6.28e-01   9.00e+00   1.00e+01   1.00e+01  3.80e+00  1.62e-04  1.47e+00
n_leapfrog__        8.93e+02  1.37e+02  2.49e+02   5.11e+02   1.02e+03   1.02e+03  3.31e+00  1.41e-04  1.60e+00
divergent__         5.00e-03      -nan  7.06e-02   0.00e+00   0.00e+00   0.00e+00      -nan      -nan  1.02e+00
energy__           -1.85e+04  1.02e+00  1.75e+01  -1.85e+04  -1.85e+04  -1.85e+04  2.94e+02  1.25e-02  1.00e+00
sd_y                8.00e-02  2.32e-05  6.04e-04   7.91e-02   8.00e-02   8.12e-02  6.75e+02  2.88e-02  1.00e+00
mu_u1               9.22e-02  3.21e-04  9.02e-03   7.65e-02   9.26e-02   1.06e-01  7.90e+02  3.37e-02  1.00e+00
mu_alpha            3.88e-02  7.23e-05  2.02e-03   3.53e-02   3.89e-02   4.20e-02  7.84e+02  3.34e-02  1.00e+00
beta                6.04e-01  2.66e-04  7.59e-03   5.92e-01   6.04e-01   6.17e-01  8.12e+02  3.46e-02  9.96e-01
theta               1.47e-01  1.24e-04  3.20e-03   1.41e-01   1.47e-01   1.52e-01  6.68e+02  2.85e-02  9.96e-01
sd_season           9.64e-02  1.59e-04  4.12e-03   8.96e-02   9.63e-02   1.03e-01  6.69e+02  2.86e-02  1.00e+00
mu_season[1]       -1.15e-01  3.26e-04  1.01e-02  -1.31e-01  -1.15e-01  -9.95e-02  9.56e+02  4.08e-02  9.97e-01
mu_season[2]       -6.03e-02  3.31e-04  9.65e-03  -7.60e-02  -6.03e-02  -4.50e-02  8.50e+02  3.62e-02  9.98e-01
mu_season[3]        1.39e-01  3.88e-04  1.01e-02   1.22e-01   1.39e-01   1.55e-01  6.83e+02  2.91e-02  1.00e+00
p[1]                6.63e-01  1.84e-03  4.84e-02   6.03e-01   6.54e-01   7.57e-01  6.96e+02  2.97e-02  1.00e+00
p[2]                6.21e-01  1.90e-04  5.53e-03   6.12e-01   6.21e-01   6.31e-01  8.49e+02  3.62e-02  9.98e-01
p[3]                6.27e-01  2.61e-03  6.93e-02   5.41e-01   6.15e-01   7.61e-01  7.06e+02  3.01e-02  1.00e+00
g[1]                8.41e-01  1.68e-03  4.63e-02   7.64e-01   8.41e-01   9.20e-01  7.56e+02  3.23e-02  1.00e+00
g[2]                3.49e-01  7.99e-04  2.19e-02   3.14e-01   3.49e-01   3.86e-01  7.54e+02  3.21e-02  9.99e-01
w[1]                7.69e-01  2.50e-03  6.60e-02   6.59e-01   7.72e-01   8.74e-01  6.99e+02  2.98e-02  1.00e+00
w[2]                1.42e-01  4.30e-04  1.18e-02   1.23e-01   1.42e-01   1.61e-01  7.56e+02  3.22e-02  9.99e-01
w[3]                6.35e-01  7.87e-04  2.16e-02   5.99e-01   6.35e-01   6.70e-01  7.55e+02  3.22e-02  1.00e+00
d[1]                6.82e-02  4.20e-04  1.16e-02   5.00e-02   6.85e-02   8.65e-02  7.60e+02  3.24e-02  1.00e+00
d[2]                6.92e-01  3.53e-04  9.73e-03   6.75e-01   6.93e-01   7.08e-01  7.61e+02  3.25e-02  1.00e+00
d[3]                2.40e-01  4.02e-04  1.18e-02   2.21e-01   2.39e-01   2.59e-01  8.63e+02  3.68e-02  9.97e-01
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
Warning: non-fatal error reading adapation data
4 of 3200 (0.12%) transitions ended with a divergence.  These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.  Try rerunning with adapt delta set to a larger value and see if the divergences vanish.  If increasing adapt delta towards 1 does not remove the divergences then you will likely need to reparameterize your model.

Completely identical C++ compiler and flags at a minimum. And if you have -march=native then identical hardware, but otherwise I don’t think identical hardware is necessarily required.

I’ll second that. I’d start by checking compiler versions. (Assuming you have the same version of the source.)

One thing that may be tricky: different versions of the boost library may generate random numbers differently. So… if your build process somehow links to the system boost that’s installed, it might bypass the included source. You’d really have to check the doc on your compiler to know if it links to the local libs before source libs. I’ve seen that happen before and it threw me off.

1 Like