Different results with Stan 2.21.0

JustinYap · July 16, 2020, 5:13am

I recently updated rstan to 2.21.1 from 2.19.3 which means that I’m using stan 2.21.0 instead of 2.91.1. I noticed a number of changes in my outputs:

The differences in parameter means seem to be equivalent to changing the seed, i.e., it seems like a different sample was generated, not necessarily better or worse. Is this expected from the update? I see from the release notes that TBB is now used. Could that be the cause? I notice a difference even with 1 chain (there is not always a difference).
The effective sample sizes and rhats are a lot worse for some parameters (e.g. ESS going from 47 to 9). It doesn’t seem like it is simply due to a different sample being taken (i.e. point 1 above), as we would observe a roughly even number of small increases and decreases. Have there been changes to the ESS calculation that could have caused this?

wds15 · July 16, 2020, 6:23am

The sampler has been optimized in between releases.

The ess calculations are also changed as I recall.

JustinYap · July 16, 2020, 11:01pm

Thanks. So the differences sampling are to be expected due to optimization of the sampler? Is this the TBB change?

Do you happen to know what changes were made to the ESS calculations? Or which pull request or commit contained the changes?

bbbales2 · July 16, 2020, 11:15pm

There’s links to the docs for the new Rhat is here: New R-hat and ESS . It’s more conservative (picks up some problems the old Rhat didn’t).

JustinYap · July 16, 2020, 11:18pm

Thanks, so these changes to Rhat are in stan 2.21.0?

bbbales2 · July 16, 2020, 11:30pm

2.21 is from October 2019 and that post is from March 2019, so my guess is yes but that isn’t 100%.

You could compute your Rhats with this package: https://github.com/stan-dev/posterior which definitely has the new stuff and see if it matches.

wds15 · July 17, 2020, 6:41am

No. The sampler algorithm changed. I don’t understand why you think the TBB - aka Threading Building Blocks - would change sampling in any way. The TBB is only there to allow seamless execution using threads. Not more.

JustinYap · July 17, 2020, 6:52am

OK. After further investigation with different seeds it seems that while there is a difference in the samples, the ESS and Rhat are not consistently worse than before.

mwmclean · July 19, 2020, 3:42pm

@wds15 Can you please point to the relevant entry in the release notes, section of the documentation, discourse thread, or relevant GitHub commits that discuss the sampler change? Your update to rstan caused our production models to start spitting out different answers, so we went to the release notes https://github.com/stan-dev/stan/releases/tag/v2.21.0. I don’t see any mention of a sampler change. The TBB entry was just the best guess what caused the diffs.

bgoodri · July 19, 2020, 3:59pm

The 2.19.x -> 2.21.x transition accumulated six months of changes in the libraries and more than a year’s worth of changes in the rstan interface, although I doubt TBB makes much difference unless your models were using map_rect. Also, the way in which the compilation works is very different now. The slightest difference in the binary will result in different draws than before, so that is not a surprise to me. And there are going to be even more noticeable changes in the 2.21.x -> 2.24.x transition.

If you think the draws have a different distribution than before, that is more of a concern. Unfortunately, since both RStan and PyStan were stuck of 2.19.x for a long time, now is the first opportunity for the 2.21 release to be widely stressed.

wds15 · July 19, 2020, 4:14pm

I think this went into Stan, but did not make it into the release notes (where it should have been).

Here is the respective git commit (which had a small bug which got fixed later)

here is the PR:

github.com/stan-dev/stan

Feature/issue 2799 robust no u turn

stan-dev:develop ← stan-dev:feature/issue-2799-robust_no_u_turn

opened 02:42AM - 13 Aug 19 UTC

betanalpha

+508 -187

#### Submission Checklist - [X] Run unit tests: `./runTests.py src/test/unit`… - [X] Run cpplint: `make cpplint` - [X] Declare copyright holder and open-source license: see below #### Summary Resolves #2799. #### Intended Effect Adds additional no-u-turn checks across subtrees to avoid missing u-turns for approximately iid normal models. #### How to Verify See https://discourse.mc-stan.org/t/nuts-misses-u-turns-runs-in-circles-until-max-treedepth/9727/36?u=betanalpha. #### Side Effects Decrease in antithetic behavior for component means in correlated models. #### Documentation Inline. #### Copyright and Licensing Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Michael Betancourt By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

and here is the long thread leading to this change

our update to rstan caused our production models to start spitting out different answers,

whow! Like the models give quite different answers? More details would be interesting… and you should ping people from the discourse thread I linked.

rok_cesnovar · July 19, 2020, 4:59pm

Isnt this the fourth bullet point in the “new features” section of the release notes? It was not super advertised, that is true.

wds15 · July 19, 2020, 5:01pm

Whoops! I read over it… but you are right… that is what referred to it. My bad.

mwmclean · July 20, 2020, 1:37am

Thanks for the additional info, all! We’re satisfied that the draws appear to have the same distribution as before and that the changes we’re seeing are the same differences we’d see when changing the RNG seed, which is in line with @bgoodri’s comment that any slight change to the binary will result in different draws. My team always need to investigate and notify clients any time we lose exact reproducibility. I wouldn’t have thought a change titled “Add additional no-u-turn checks” would have this effect.

bgoodri · July 20, 2020, 3:56am

I just assume that everything will affect the values of the realized draws but if we think a change will affect the distribution of the realized draws, then it will be prominently advertised. For something like an additional no-u-turn check, all it takes is for that check to return true on one iteration and the whole rest of the chain will have different realizations. But really all it takes is for some calculation to return a value that differs in the 15th decimal place from what it was before and that can cause the chain to go somewhere different.

wds15 · July 20, 2020, 8:26am

A version change will probably always imply that you loose exact reproducibility. Finding out why each time is quite some call.

Topic		Replies	Views
Speed issues since upgrading to RStan v2.21.2 rstanarm	39	998	August 30, 2020
Sampling time using beta_binomial of rstan version 2.19.2 and 2.19.3 Developers fitting-issues	1	462	April 7, 2020
Slower sampling after rstan update RStan	13	624	January 24, 2024
Differences between model results, Rstan 2.26.22 vs. CRAN version General	3	398	August 11, 2023
Rstan 2.19.2 slower than 2.18.1 Developers rstan	15	1243	August 27, 2019

Different results with Stan 2.21.0

Related topics