Me, @avehtari, @paul.buerkner and Eero Linna has put together the first draft of a posterior database. The idea is to collect posterior distributions (i.e. data and models) and potential gold standard posterior samples in a database for easy access. Now it only contains a few examples, but we hope that if we set the structure and general idea, it can be relatively quick to add new posteriors, models and data. The repository README should contain all necessary information.
All comments are welcomed!
Hopefully, we are also soon done with bayesbenchr an R package that can be used to evaluate different diagnostics and inference methods for the posteriors in the posterior database.
Is there some way we could build a python package that would fit these needs/enable it to use in arviz? I know Eero is working with a python package. So any suggestions there? I think we would like to have the python and R api quite similar so you could look at the R api and see if there are something you miss or would like to change.
Your suggestions on priors, likelihoods, predictives etc is somthing we have discussed and as you mention those would be valuable to have. The question is how to include those. We can get it from the stancode currently. So then the question is if we should include them as R and Python functions.
How precise should the submitted posteriors be? What are the precision metrics? (minimum tail ESS?)
Given the substantial computational requirement for some posterior distributions, is it sensible to also store (potentially generative) approximations to the posteriors based on the “gold standard samples”? These approximations could be used to cheaply generate an arbitrary number of samples for the purpose of making figures look good. I guess one can sample from the posterior predictive distribution (PPD) an arbitrary number of times given a fixed sample from the posterior, but what if generating from the PPD is computationally expensive? (Perhaps not the most likely of scenarios)
What kind of metadata are you interested in collecting? (Sampler type / adaptation diagnostics / tuning parameters?) It would be interesting to know the computational time / cost / environment used to achieve the archived samples. This is definitely off topic (more related to ML than MCMC-based statistics) but it would be interesting to collect this information to estimate the energy usage in a manner similar to: https://arxiv.org/abs/1906.02243. I’ve seen a few models that have substantial compute requirements and subsequently churn through a lot of cluster time & credits (See Section 4.2 of https://projecteuclid.org/euclid.aoas/1560758424 for example) .
These things are probably beyond the scope of your current interests, but I’d be interested to know your collective thoughts on them. I really do like the idea of reusing posterior distributions in future analyses :).