Reading over this conversation I was a bit confused. If we think of the parts of the API broken up with something like
- User facing service layer
- Developer facing layer
- Contract on return object structure for (1) and (2)
- Contract on statistical properties of return object for (1) and (2)
Most of this conversation is around (4)? Would be nice to also have 1:3 defined
While we can’t make guarantees for any arbitrary models would it be useful to define tests with something like SBC? In my mind space the only way to deal with 4 is to have some unit testing scheme on known models and models we know we will do badly at.
Part of the issue that this all came from is that the algorithms are hard and important and we need some nice way to check that changes make sense. If we don’t do that then pretty much any change to algorithms is going to be stalled like this. Maybe this would even be a nice place for the bayesian posterior database, we can use it to define a formal testing scheme for the level of algorithmic soundness we feel comfortable giving a level of promise too.