A long time ago, we had run all the BUGS examples to test for a few things:
test for compilation (we don’t need this now)
check that we didn’t break things between commits (we found a lot of things by checking end-to-end that we missed with unit tests)
get an overall sense of speed (we ran this on Jenkins and just tracked the overall runtime; little slips in gradients that passed tests would often get caught here)
We had removed them due to too many false positives and @betanalpha had a framework for testing the correctness in a different way. They still provided some value since we don’t really have end-to-end tests now.
Here’s an example test:
Maybe we should start testing something like this and tracking performance?
In this branch, I added the example-models repo as a submodule, do some slight trickery to try to find model / data file pairs, then compile and run them, recording the output to tests/golds (which I was hoping to check in) and writing a timing file to times.csv (not checked in, but tested always by Jenkins on a specific machine).
I’m not sure how useful the golds will be if we can’t figure out any way to get reproducibility at least for like, clang + OS X or some reasonable pair.