Prior-predictive samples (sometimes) affected by operating system?

winterstat · May 8, 2021, 7:56pm

I hope this is the right place to ask this question.

I was generating prior-predictive samples to assess the appropriateness of different prior specifications and decided to include a specification that has what one might consider diffuse/uninformative priors (e.g., a very wide normal prior). When I compared the prior-predictive samples across operating systems (Linux, Windows, MacOS), I noticed that they were different!

This difference does not occur when I generate prior-predictive samples based on more informative priors. The results are also the equivalent across OSs if I estimate a “regular” model with the data included to get posterior samples (even with the diffuse priors).

I made sure that the same version of R and rstan were used on each OS. I also used the same seed (using set.seed() and the seed argument in the sampling() function). I wonder if there is some inherent difference in how seeds are treated by different operating systems? Or if these differences might be related to how models are compiled across different OSs? I’m hoping that someone who knows more about these things can help me understand what’s happening? Any input would be much appreciated :)

Thank you!

ahartikainen · May 9, 2021, 11:22am

Hi

The results are different, but still in mcmc error (mcse)?

See the following page

winterstat · May 11, 2021, 5:13pm

Thank you for sharing that resource! I hadn’t thought to look at the general Stan reference guide, which seems like the obvious place to look in hindsight.

I just checked the mcmc error of the prior-predictive samples generated on a Mac and on a Linux. The estimates do not appear to be within mcmc error of each other. For example, for one generated variable, average estimate for the Mac is 4.643676 (mcmc se = 13.60324) but for the Linux, it’s -18.46626 (mcmc se = 12.80303).

The large se values make sense to me because of the diffuse priors (as I mentioned in the first post, these discrepancies disappear with more informative priors). But the estimates are really quite different. After reading the info on the page you shared, I know that there are many potential differences in hardware/libraries/etc. that I may not be able to control (the Linux machine is a cluster managed by my institution).

jsocolar · May 11, 2021, 5:32pm

Those don’t look so different; the standard errors are quite big.

winterstat · May 11, 2021, 8:47pm

Hmm, true. Looking at the mcmc se values definitely helps explain some of the differences. I guess it’s just showing that the prior isn’t providing much direction (which is not strange) and that the resulting samples can come out very different depending on the specific computer/other underlying libraries etc. used. Thank you for helping me think through this, @jsocolar and @ahartikainen!

Topic		Replies	Views
Replication issue between macOS and Windows General	6	53	November 12, 2024
Weird inconsistent behavior between OSX and linux cluster on same Stan model Modeling	2	421	April 15, 2021
Rstan : Could the output of the stan model (each post warmup iteration draws) be different between linux and windows? Modeling rstan	2	324	June 30, 2023
Same code (with the same seed) but different results on different platforms? Why? General rstan	2	1450	August 29, 2021
Differences between model results, Rstan 2.26.22 vs. CRAN version General	3	385	August 11, 2023

Prior-predictive samples (sometimes) affected by operating system?

Related topics