Random number generation separate for generated quantities?



A BUGS program we run generates numerically different inferences due to the fact that predictive distributions are generated for different outputs (same seed+data, but differences in # of points at which to generate the predictive leading to different random number “consumption”). This is a bit annoying as with same data and same seeds I would like to get the same inferences. From my understanding Stan works the same here and I should get the same behavior. So having a generated quantities block or not makes a numerical difference for the inferences.

I was wondering if it would be possible to make the random number generator for the generated quantities block use an independent random number stream which would solve the issue. So the desired behavior would be to have same seed+data lead to exactly the same inferences regardless of the predictives.

Another solution could be to use the new gq facility (have a Stan file with only the data/parameters/model block and another one which includes the first and adds a gq block), but I am not sure on that as I have no clue yet how it will look like.

Comments would be welcome and I am happy to file a feature request given this behavior is feasible to implement and others agree on the use of it.



The samplers use both a random seed and the chain id when creating the RNG.
To get consistent results, you need to use the same seed and same number of chains.

Using the new gq facility sounds like a good solution. Here’s the spec: https://github.com/stan-dev/stan/wiki/Standalone-Generated-Quantities:-Functional-Specification

I started working on a cmdstan implementation - stalled because of I/O issues (discussion here: Adding standalone generated quantities option to cmdstan (and rstan and pystan))