I’m finishing up a draft manuscript that I’ve been working on for some time. I’ve got some fairly large rstan objects saved that I ran previously using rstan 2.17.3. In the meantime, I’ve updated to rstan 2.18.
I’ve got a sentence in my manuscript where I reported the average number of effective samples for a set of parameters. I wanted to double check those numbers, but I wasn’t able to replicate what I had reported earlier using rstan 2.18 to summarized a saved rstan object that was fit using rstan 2.17.3. Even more odd, I’ve got n_eff numbers that are greater than the number of post-warmup samples and some that are even greater than the total number of samples.
I’ve still got rstan 2.17.3 on another machine, so I’m able to directly compare the results from the same model object but summarized using the two different version of rstan. I’m able to confirm that for parameters where the n_eff was previously reported as equal to the number of post-warmup samples (in this case 5000) it will now often be reported as greater. For some parameters that previously had n_eff less than the number of post warmup samples, the reported n_eff also differs between 2.17.3 and 2.18, but not in a systematic way. For example the first two parameters have an n_eff of 3580 and 2646 reported by 2.17.3, but are now 3516 (lower) and 2661 (higher) under rstan 2.18.
I’d like to avoid rerunning the models, just because it’ll delay me another week, but I want to make sure I can trust those previous runs. Why would n_eff change? Which numbers should I report?