@andrewgelman, @lauren, @charlesm93 and I have been talking about performing a systematic analysis of (and writing a survey paper about) computational approaches to scaling Bayesian regression (linear, logistic, possibly hierarchical GLMs) to large data sets, where the qualifier “large” means something like
large enough to take an annoyingly/impractically long amount of time to perform posterior inference in a reasonable statistical model for the data.
In order to make the survey motivated by real statistical practice, I am looking for (descriptions of) examples of real data sets that people want to fit Bayesian GLMs on but that are giving computational problems, due to their size.
Question: could anyone provide me with illustrative examples of data sets and models they are working with?
Of course, the data itself may be sensitive. What I am really looking for is to get a sense of the structure of the data (sample size, number of covariates) and the corresponding model (any hierarchical structure used; particularly, the number of parameters and their hierarchical structure). The motivation is that we can then run our benchmarks in regimes of datasets and models that people are actually interested in.
Also, if anyone is interested in taking part in our discussions/getting involved, please let me know!