Examples of Stan being used in context of Bayesian inference / regression with large open datasets that relate to individuals such as census data etc

hwilde · January 28, 2020, 12:15pm

I thought I would ask the community here as I am currently trying to build some example materials that leverage Stan and investigate its scalability on larger scale datasets. Has anyone seen any particularly compelling pairings of Stan models supplied alongside large datasets, preferably open-source and datasets that focus on people / sensitive information? Any examples people can give would be much appreciated. I have read some similar discourse here which was somewhat helpful but lacking in full example pairings of a Stan model and the data.

bgoodri · January 28, 2020, 1:55pm

I give a homework assignment using wage data from the U.S. Census

https://courseworks2.columbia.edu/files/4420285/download?download_frd=1
https://courseworks2.columbia.edu/files/4370169/download?download_frd=1

But there isn’t a whole lot to it, since the individual data are already anonymized.

peopletrees · January 28, 2020, 2:27pm

I’m not sure this is close enough to what you’re looking for, but I recently did an analysis of a publicly available dataset of >6000 individual shootings in Baltimore using rstanarm and posted the complete code along with the dataset here.

The particular subset of the data that I analyzed may not be large enough to interest you (n=~6 thousand shootings), but the underlying dataset which is provided is actually every victim-based crime occurring in Baltimore over the past 7 years (n=~400 thousand). So, if you subset to a more common crime than shootings, you can probably run my code without too many modifications to obtain a reasonable case study of stan on data that’s “big”, at least by the standards of public health.

peopletrees · January 28, 2020, 2:52pm

BTW: the same dataset was discussed previously in this thread, which you can check out for some actual Stan code.

hwilde · January 28, 2020, 9:39pm

Regardless of the anonymisation this is still a pretty good example so thank you for sharing!

hwilde · January 28, 2020, 9:40pm

This is great, I will go through your repo as there looks to be some interesting stuff inside, thank you!

Topic		Replies	Views
Wanted: datasets & Stan models with many exchangeable observations for the Bayesian infinitesimal jackknife General	5	526	September 27, 2020
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3393	March 5, 2018
Big data sets for Bayesian regression? General	11	4492	July 30, 2022
A Stan example from Bayesian Data Analysis General	6	583	November 15, 2019
Log-linear contingency table analysis Modeling	9	1394	July 26, 2022

Examples of Stan being used in context of Bayesian inference / regression with large open datasets that relate to individuals such as census data etc

Related topics