Bayesian Benchmarking 1.0

Hi all, I’m working on Benchmarking GSOC project under the guidance of @mans_magnusson and @mike-lawrence. The goal is to build a set of canonical Stan models that will be added to posteriorDB to serve as reference points against which new approaches to Bayesian computation can be compared.

I have compiled the first set of models for Bayesian benchmarking in this list below and would like to get your feedback. If you have any model with reference and real dataset (if possible) that should be included in this list, please let me know.

Link: Model Benchmarking - Google Sheets


Replying partly to bump visibility, but also to note that ideally we’re looking for models with existing representation in a citeable literature so that it’s easier both for KT as well as later users of posteriorDB to find resources explaining the model and the data it is intended to address.

1 Like

Hi @kn2465,
Are you knowledgeable about how to generate protobuf outputs for Stan models? My group is also looking at benchmarking different algorithms with canonical models, and protobuf seems to be a good way to identify sinks in the computational time.

Hi there - Please excuse my ignorance. I had to look it up, protobuf seems like a tool for output serialization, how is that related to computational time? Because it allows for streaming output so you can do some diagnostics while the model is fitting?

Hi @kn2465 and @mike-lawrence ,
Is this an open invitation to kick flavors of model that I work on your way to try to incentivize development of algorithms that are efficient for my uses (provided that the models are published, etc)? Or are you looking for users who have some genuine intuition that their model will be an important complement to the existing set?

Two more general observations:

  • I think it would be useful to include a set of (G)LMMS that explicitly span regimes of small-to-big data and few-to-many parameters. I’m not sure whether that’s already in there. In the big-data regime, the necessary parameterizations and optimizations get a bit interesting (e.g. centered parameterizations for big data and few parameters; slicing over the parameter vector rather than the data for reduce_sum with many parameters).
  • I think it would be in Stan’s interest to include more than just one really large model (i.e. tens or hundreds of thousands of parameters). These are models where HMC really shines, and there’s value in showing the world that Stan consistently blows Nimble (or whatever) out of the water in this regime, rather than leaving people to wonder about the specifics of one individual model. And this would be nice to show in something as ubiquitous as a GLMM (rather than just an LDA), because it would maximize the population of users who care about the difference.

The ideal is something that is already published somewhere for documentation purposes (as well to ease us getting up to speed on its intent/structures), with a close-secondary filtering priority placed on things not obviously already in our list.

I think we have these covered. I have simulation code for models akin to that shown in the SUG Section 1.13 on hierarchical models that permit generating data with arbitrary numbers of predictors, mid-level hierarchy unit counts, and observation counts (the latter influencing whether centered or non-centered should be expected to sample better).


Started a repo here,. So far has just the already-optimized hierarchical models.