A suggestion for 'pinning' versions of the default priors

bjw · August 30, 2017, 9:16am

Having watched the warning about default priors not remaining the same in future versions of rstanarm for the umpteenth time, it occurred to me that there was a kind of tension between the aims of rstanarm to provide a sane set of defaults for a broader set of users, and the need for reproducible research.

Inevitably many people are going to use the default priors because i) it’s easy and ii) they are sensible and seem to reflect your thinking on good guidance in the absence of an informative prior. However if you do change things in future then it would make it hard to reproduce an older analysis.

I wondered if it would be possible to store the values for priors in each new release, and then allow users to specify a date to pin the priors to? That way it would be possible to recreate an older analysis by simply typing lmer_stan(…, pin=‘1 Jan 2017’).

bgoodri · August 30, 2017, 3:03pm

Even if we had that cache, the results are not going to be bitwise reproducible due to changes in Stan. So, if that kind of reproducibility is important, you need to know what version of Stan, rstan, and rstanarm were used at the time of the original analysis. Even then, the results are going to be slightly different on different computers due to differences in the versions of the compiler and the operating system. Basically, to be that reproducible you need a virtual machine or a Docker container that freezes everything.

bjw · August 31, 2017, 8:35am

I realise bitwise replication not possible given the complexities of the underling Stan machinery, and I think people accept MC error between runs on different computers etc… but if the priors in rstanarm do change substantially between version it might be worth keeping an explicit changelog so that it’s possible to reconstruct at least the original Stan model from older rstanarm code.

Bob_Carpenter · September 3, 2017, 12:37pm

Bitwise replication is not only possible, but required for much of our clinical trial work. The manual has a chapter on what you need to do to guarantee it, which is basically lock down everything involving hardware and software. It’s not that Stan’s erratic, it’s that operating systems, CPUs, compilers, compiler optimization levels, etc. all impact floating point behavior.

bjw · September 4, 2017, 11:05pm

Apologies - I didn’t mean to imply Stan was erratic. By ‘not possible’ I really mean ‘not possible without lots of effort external to the stan project’. But I can see that just pinning the priors is perhaps a pointless halfway house. I probably need to bite the bullet and properly document what rstanarm is doing for me to save the pain later!

Topic		Replies	Views
Rstanarm prior specification: stan_glm.nb() and stan_glm() Poisson rstanarm	6	1049	June 21, 2017
Same code (with the same seed) but different results on different platforms? Why? General rstan	2	1128	August 29, 2021
R package based on rstan -- user-defined prior distributions General rstan , rstanarm	0	301	May 2, 2023
Constraining Change in Estimates Over Time Modeling specification	12	505	December 1, 2018
Differences between model results, Rstan 2.26.22 vs. CRAN version General	3	309	August 11, 2023

A suggestion for 'pinning' versions of the default priors

Related Topics