Different Outputs in RStan vs. PyStan



Using the same model, seed, iterations and warmup, is there a reason why RStan would yield different output than PyStan for the same input dataset?

Look forward to insights.



Different implementation of the PRNG?


Avraham, thanks for reply.

I used the get_seed() in PyStan and RStan, and both return the same value as the input I have.

Is there another check I can do?



This is a bug if it is happening. Draws and optimization results should be identical given the same random seed. Other things, such as the printed summary of draws, are not yet identical.


Thanks for feedback @ariddell

I wonder if anyone else had this same issue.


You need to set the seed in Stan, not in Python or in R.
At that point, everything should produce the same results.

If it doesn’t, it should all get fixed after the refactor
lands in 2.15 and we call exactly the same underlying code
for sampling (rather than having a lot of complicated duplicated
code across interfaces).

  • Bob


Also check to see if the chain_id is the same as it strides the RNG to ensure that multiples chains don’t lock. For example, in RStan chains=1 sets the default chain_id to be 1 where as the default chain_id in CmdStan is 0. I had ran into this issue recently and had to resolve it by setting id=1 in CmdStan.


Thanks for the response.

Update: ran both models with reduced warmup to sampling ratio and got closer outputs; initially, warmup/sampling ratio is 1:1, now it’s 1:4.

@Bob_Carpenter: I set the seed within the stan() call, with the seed argument; this should null the issue you noted.

@betanalpha: Similar to seed, I set the chains argument in stan() call to a 1 for both; hence, shouldn’t run into the issue you noted.


Are you running on same hardware with the same underlying C++
compiler and compiler options? They really should produce
exactly the same answer in that situation.

We’re not going to try to debut this now as everything’s getting
changed in Stan 2.15 and that should make everything consistent.
For now, there are probably variations in the algorithms that
aren’t updated the same way across RStan, PyStan and CmdStan, which
is one of the main reasons we pulled all the important code down
into C++.

I don’t know hat you mean by 1:1 in warmup/sampling ratio.
Is that just the number of iterations you chose?

  • Bob


Same machine for running both.

Regarding the warmup/sampling ratio, the initial run on both had total iterations at 20000, and warmup of 10000 (i.e. sampling 10000 as well). This gave different output when extracting quantile values (e.g. 10% quantile).

The second run had the same number of iterations, but warmup is set to 5000 (i.e. 15000 sampling); this gives a closer output for both.


As Michael’s suggesting, I think it’s probably a count from 0 vs.
count from 1 issue on the chain ID. Have you tried controlling for
that? Also, I think your reply is truncated.

In general, you just want to be able to replicated within MCMC
standard error. The whole process is probabilistic and there’s nothing
special about any given seed and run.

And as I said, we’re bringing all the implementations exactly in
line with the forthcoming Stan 2.15 release.

  • Bob


@Bob_Carpenter: I set the chains argument in stan() call to a 1 for both; hence, shouldn’t run into the issue you noted

With the lower ‘warmup:sampling’ ratio, we’re within the MCMC standard error, as measured by RMSE.

Look FW to 2.15.

Thanks for the all the feedback.



@leungi, with the same compiler, same settings, same hardware, same operating system, same versions of Stan, same random seed and chain id, the runs will be identical. Not just within MCMC SE, but identical.

I think PyStan, by default, has a compiler optimization option of -O2, which will make it not run the same way. But I’m not absolutely sure.

You might have to explicitly set seed to make them match. Just setting number of chains might not be good enough.


As noted in previous post, seed and chain were explicitly set to be the same, hence the confusion.

Thanks for feedback.


Up until recently (current develop, not sure if it’s release yet) the different interfaces did not all initialize things in exactly the same way so you might end up with one more or less PRNG call in one interface vs. another. When Daniel says same x/y/z that includes interface and (as Daniel mentions above) optimization level.