Stan behavior changed in the last few days

syclik · June 17, 2017, 1:56am

Did anyone merge anything into Stan that knowingly changed behavior?

It started here (see the change log):
http://d1m1s1b1.stat.columbia.edu:8080/job/Daily%20Stan%20Performance/887/

I’m looking here at the recently closed pull requests:

@mitzimorris does the rng pull request make Stan run differently?

mitzimorris · June 17, 2017, 2:11am

does the rng pull request make Stan run differently?

that PR changes the generated model’s 3-arg constructor so that it takes an unsigned int seed instead of an instantiation of an RNG.

there are issues filed on the interfaces.
I have a branch for cmd stan that fixes this.

github.com/stan-dev/cmdstan

instantiate stan model with same random seed as used for sampling

opened 09:45PM - 12 May 17 UTC

closed 09:45PM - 12 Dec 17 UTC

mitzimorris

#### Summary: Please provide a short couple sentence summary. In order to use …RNG functions in the tranformed data block, the Stan model must be constructed with a boost rng object which has been instantiated with the random seed. #### Description: see stan-dev/stan#2241 for details and discussion. #### Reproducible Steps: Use the following program to test: ``` data { int<lower=0> N; } transformed data { vector[N] y; for (n in 1:N) y[n] = normal_rng(0, 1); print(y); } parameters { real mu; real<lower = 0> sigma; } model { y ~ normal(mu, sigma); } ``` #### Current Output: Current output results in identical values for vector y, no matter what the seed. #### Expected Output: Different seeds should give different values. #### Current Version: v2.15.0

github.com/stan-dev/pystan

parameters defined in the parameters block.

opened 02:47PM - 12 Feb 22 UTC

closed 07:00AM - 31 Jul 22 UTC

ahartikainen

bug wontfix

#### Describe the bug  Possibility to access parameters defined in the `parameters` block. Current implementation includes also parameters defined in `transformed parameters` which are not needed for logp and constrain functions. #### Describe your system  #### Steps/Code to Reproduce <!-- Please add a minimal example that we can reproduce the error by running the code. Be as succinct as possible, do not depend on external data. In short, we are going to copy-paste your code and we expect to get the same result as you. **Note**: Please read [this guide](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug. #### Code Sample, a copy-pastable example ```python # Your code here ```

syclik · June 17, 2017, 2:15am

@mitzimorris I don’t think I have any more clarity. One of the pull requests that was merged changes behavior in Stan. Something makes it not reproducible between a few days ago and now.

This is before it gets out to the interfaces. This is just within Stan.

sakrejda · June 17, 2017, 2:18am

Doesn’t the RNG PR make the seeds for the chains be off by one from before? Or are you saying that currently it’s not run-to-run rep;roducible?

syclik · June 17, 2017, 2:27am

It is run-to-run reproducible. It’s not reproducible between commit b5f6457 and what it is today ee8f23e.

I believe I found the change. The stride is different.

And that changes everything. I’m ok with change, but we should really know that it happens and have justification why it happens. Right now, things like this are caught by the performance test, which maybe we should call a regression test or something like that.

@seantalts, any reason it’s been disabled on Jenkins? It would have signaled that Stan is changing behavior. That’s something we should know. It doesn’t happen often, but when it does, we should really know why we’re doing it.

mitzimorris · June 17, 2017, 2:35am

correct, stride is different.
the reason for this is that RNG in transformed data will have stride “0” of the seed.
the chains get stride corresponding to their id - was corresponding to (id - 1).

I didn’t realize this was part of any test.

syclik · June 17, 2017, 2:44am

sorry, there should have been more unit tests for creating the rng. Regarding the performance test: don’t worry about it. We don’t have real regression tests. It would have been really helpful if it was still active in Jenkins for this reason. Just as a sanity check until we have regression tests available.

syclik · June 17, 2017, 3:00am

New issue created:

sakrejda · June 17, 2017, 3:48am

syclik http://discourse.mc-stan.org/u/syclik Developer
June 17

sakrejda:

Doesn’t the RNG PR make the seeds for the chains be off by one from
before? Or are you saying that currently it’s not run-to-run rep;roducible?

It is run-to-run reproducible. It’s not reproducible between commit
b5f6457 and what it is today ee8f23e.

I believe I found the change. The stride is different.

That’s what I was trying to tell you.

syclik · June 17, 2017, 4:38am

Sorry, didn’t realize you were trying to tell me that with the off by one comment. When I first read that, I thought there was one more or one less call to the rng, not that that the stride was changed.

sakrejda · June 17, 2017, 2:21pm

No problem, just trying to save you some time.

mitzimorris · June 17, 2017, 2:24pm

do we need more documentation beyond the current doc for stan/services/util/create_rng.hpp ?

 * Creates a pseudo random number generator from a random seed
       * and a chain id by initializing the PRNG with the seed and
       * then advancing past pow(2, 50) times the chain ID draws to
       * ensure different chains sample from different segments of the
       * pseudo random number sequence.
       *
       * Chain IDs should be kept to larger values than one to ensure
       * that the draws used to initialized transformed data are not
       * duplicated.

seantalts · June 17, 2017, 3:07pm

@syclik regarding the performance test that was part of the Stan PR build - it had been failing for a while and I think you suggested just disabling it, so I did. I left the daily performance test intact, figuring that would be enough to figure things out with our relatively low volume of pull requests. I will make it email me on a failure, since I hadn’t noticed it start failing.

I see you have a PR to update the “golden master” for the performance test after Mitzi’s RNG update (thanks!), and that Stan pull requests are now running the performance tests (that was you, right?). Is there other stuff going on or anything you think I should look at? Just trying to catch up on this topic.

syclik · June 17, 2017, 3:19pm

Yeah. I reenabled it! It was a failure with the config, so I adjusted it to clean out the workspace before builds.

I think it’s the wrong test to be running to catch these sorts of failures, but we don’t have anything else in place. We used to run the BUGS examples years ago and we ran into other problems there too.

Either way, we just need to know when we’re changing behavior.

mitzimorris · June 17, 2017, 5:23pm

it would be nice to have a definition of what is/isn’t expected behavior - from there we could put in place some kind of unit tests.

syclik · June 17, 2017, 7:20pm

+1. I’ll try to formalize it.

Bob_Carpenter · June 17, 2017, 7:48pm

I think the biggest thing needed there is to test that if you run multiple chains with same seed and different chain ID, then the transformed data with rngs is the same across chains.

This change intentionally broke the seed behavior on chains. And it will also break any exact tests that were based on seed.

Having tests that are based on exact RNG seeds replicating across releases seems too strict to me. Or, as @seantalts suggested, we need a way to generate new expected behavior, in which case, all the test does is provide a flag that something’s changed. I think that may be what @syclik wants, but I’m not sure.

As far as Stan goes, there’s no reason to keep things seed-to-seed compatible across releases. We just need to preserve the statistical properties, which are far more subtle and require something like @betanalpha’s testing framework.

syclik · June 17, 2017, 8:01pm

That would be great!

We’re in agreeement.

I’m looking at this from the point of view of getting us to have faster development with more contributors. We can’t get there if things break between pull request merges. Right now, we have unit tests, but no real integration / regression tests that would let us just move forward with a little more ease.

Topic		Replies	Views
Initializing Stan Model object with random_seed - all interfaces should use 3-arg constructor Developers	4	821	May 16, 2017
Different Outputs in RStan vs. PyStan Interfaces	28	3005	April 19, 2020
Adding unused variable changes model behaviour (Answered: no bug) Modeling specification	4	742	April 19, 2019
Data file in stan example models General	1	411	July 20, 2018
Getting different samples when running rstan with seed option Modeling	8	941	April 15, 2018

Stan behavior changed in the last few days

Related topics