Major changes, like the refactor
Daniel took on, took years. The reason it took so long is that
he insisted on having these tests (though apparently, they weren’t
thorough enough, because we still broke the software).
The elapsed time was years, but it wasn’t due to insistence or these tests. It was caught up in design. The end product, which is in now, is still a compromise on the design side and hopefully we can straighten it out over time.
I just wanted to make it clear that the tests were not the bottle neck.
I noticed that too! Not the tests. I helped wrote those and while they were irritating it wasn’t more than a few hours.
Also I have no doubt that Michael’s alternative would’ve been relatively similar to implement even if it might have had more combinatorial issues in testing.
We could’ve implemented and compared both approaches before deciding which to use and it would still have saved time compared to the drawn out disagreement.
Michael had a prototype implementation. I put up a design with a prototype. It was deadlocked on design principles.
The current implementation is a compromise that looks a lot like Michael’s, but with some key changes that will allow us to move towards a better design sooner. This was one of the last things before we decided to split up who has ultimate decision making power (to be used when there’s a deadlock):
Daniel: math library, Stan library as it pertains to interface API
Bob: Stan language
Michael: Stan library as it pertains to the algorithms.
Since this refactor landed, we’ve already moved forward with the logger! We’ll get to a better design that makes it easy for external developers soon enough.
The testing helped me move quicker. Btw, thanks for writing the instrumented callbacks and all the other tests! They came in handy with the logger implementation. It’s so much easier to move things quickly when there are tests in place.
Great question. It would have been faster, but not by much. I can guarantee that it would have been buggy with bugs that wouldn’t have been detected and would take along time to hunt down.
It would have felt like the bad place we were in after Stan 2. I had a golden test set up with one model that was good and one bad model, which is why I didn’t notice the change in the rejections. I think I spent half the coding time actually working through some really subtle things that we never thought to unit test. There are tests in there now, but I’ll just say having coding against a reference implementation made things easier, not harder.