Beta-release Bayesian Posterior Database

mans_magnusson · December 8, 2019, 11:40am

Hi all!

Nice discussion and very good suggestions for further development.

@ahartikainen It should be great to set up a connection to get posteriors, data, and models directly from Github through Python. Hopefully, this can also enable quick access to a RESTful API further on. Unfortunately, connections to Github are not implemented for Python yet and I’m not sure when it will be. I’ll set it up as an issue, and it would be great with some help from the Python people here. I don’t think it is much more work that is needed.

@betanalpha @avehtari @paul.buerkner:
I agree with the expectations and that parameter expectations and variances could be computed using 100 000 draws and stored separately in addition to the smaller sample. If we get a “real” database up we could extend the number of samples to 100 000 further on.

Regarding what expectations to use, I think, as @paul.buerkner noted, this should probably be implemented more generally in the posterior package. Now the samples for posteriordb are returned as a posterior draws object to facilitate quick analysis of the draws. Hence implementing estimators should preferably be included in the posterior package, the idea would be available more generally in the Stan R ecosystem.

I also think that SBC should be included as a slot on each model and framework (Stan in this case) to test the specific model at hand. I put that on the todo as well as including more details regarding:

R/Python, the makevars file, the systemInfo() (and equivalent for Python)
Is there anything more that should be included for reproducibility?

Regarding diagnostics. Currently, we only check divergencies, Rhat and ESS (bulk and tail). Besides we should add that E-FMI is higher than 0.3. Is there anything else that should be checked for references posterior draws?

The next step is also to try to find a way to compare two distributions based on samples and try to find some measure of the distance between distributions. I know @breckbaldwin would ideally want to have one measure that could be used. Any thoughts or pointers there @betalalpha? We have been discussing maximum mean discrepancy (MMD) loosely but have not yet got started with this: http://www.jmlr.org/papers/volume13/gretton12a/gretton12a.pdf

@andrewgelman Thank you for the link! I may get back to you with questions on this.

Topic		Replies	Views
Beta release of our 'posterior' R package Announcements posterior-package	2	733	June 15, 2020
Posteriordb, beta version 0.2 General	3	487	September 23, 2020
Request for comments: Bayesian Posterior Database Developers pystan , rstan	7	781	August 16, 2019
Posteriordb 0.5 out Modeling	1	277	November 6, 2023
GSoC 2021 - Q/A thread Google Summer of Code	30	2847	April 13, 2021

Beta-release Bayesian Posterior Database

Related topics