This was a comment from @Bob_Carpenter on the GitHub issue tracker
Major changes, like the refactor
Daniel took on, took years. The reason it took so long is that
he insisted on having these tests (though apparently, they weren’t
thorough enough, because we still broke the software).
The elapsed time was years, but it wasn’t due to insistence or these tests. It was caught up in design. The end product, which is in now, is still a compromise on the design side and hopefully we can straighten it out over time.
I just wanted to make it clear that the tests were not the bottle neck.
I noticed that too! Not the tests. I helped wrote those and while they were irritating it wasn’t more than a few hours.
Also I have no doubt that Michael’s alternative would’ve been relatively similar to implement even if it might have had more combinatorial issues in testing.
We could’ve implemented and compared both approaches before deciding which to use and it would still have saved time compared to the drawn out disagreement.
Michael had a prototype implementation. I put up a design with a prototype. It was deadlocked on design principles.
The current implementation is a compromise that looks a lot like Michael’s, but with some key changes that will allow us to move towards a better design sooner. This was one of the last things before we decided to split up who has ultimate decision making power (to be used when there’s a deadlock):
- Daniel: math library, Stan library as it pertains to interface API
- Bob: Stan language
- Michael: Stan library as it pertains to the algorithms.
Since this refactor landed, we’ve already moved forward with the logger! We’ll get to a better design that makes it easy for external developers soon enough.
Yeah I had forgotten that part. The lack of progress was even worse than I remembered!
My main point was that we did it to ourselves, not that the testing did it to us.
The testing helped me move quicker. Btw, thanks for writing the instrumented callbacks and all the other tests! They came in handy with the logger implementation. It’s so much easier to move things quickly when there are tests in place.
I think you’ve said this a half dozen times now but you’re still welcome! :) I like to put my time where my mouth is.
Sorry—didn’t mean to imply it was only the testing. Definitely debating the design took a lot of effort, and we never did get to consensus.
Do you think it would’ve been faster if we hadn’t insisted on maintaining identical behavior to what we have now?
Great question. It would have been faster, but not by much. I can guarantee that it would have been buggy with bugs that wouldn’t have been detected and would take along time to hunt down.
It would have felt like the bad place we were in after Stan 2. I had a golden test set up with one model that was good and one bad model, which is why I didn’t notice the change in the rejections. I think I spent half the coding time actually working through some really subtle things that we never thought to unit test. There are tests in there now, but I’ll just say having coding against a reference implementation made things easier, not harder.