SBC StanConnect 2021

Hello,
@martinmodrak and I would like to organize a StanConnect session on Simulation-based calibration (SBC).

SBC is relatively young but a versatile diagnosis tool that could calibrate mathematical models (prior, observational) and computational algorithms. In this sense, we wish to design the session as interactive as possible so please react to this post regarding any of the following.

  1. speaker/poster
  2. discussion topic suggestion
  3. what you wish to learn from the tutorial
  4. would like to attend

For 1, it would be great if you could suggest an abstract.
For 2, some example topics could be SBC computation speedup techniques, SBC for hierarchical model etc.
For 3, @Dashadower and I are planning a tutorial with this package (tentative), and would be happy to tailor to the needs of the audience. Please refer to README for a brief SBC introduction with its use cases and references.

Tentative date would be somewhere between end of June to August (edit).

Thanks,
Hyunji

16 Likes
  1. I would love to learn more about options for SBC for computationally expensive models/datasets. In the context of big data and a complex model, are there strategies for designing a population of smaller datasets and/or simpler models to (sorta kinda) validate the full computation?
4 Likes

4 please :)

2 Likes

Very interesting proposal!

  1. I vote with Jacob but I would add that I would also explicitly like to relate this to point 2 in the readme (i.e., how does this help us apply approximation algorithms to complex models?).
  2. I’m mostly interested in how SBC can be used to speed up model development and computation.
  3. Tentative (depends on timing relative to moving house etc.)
2 Likes

I could talk about SBC in phylogenetics, which is work joint work with Remco Bouckaert and which I’ve also discussed with @betanalpha. Very preliminary stuff. The downside is: there’s no Stan involved, although I’d be keen to learn about plotting/analysis routines that could be adapted.

2 Likes

I’d be keen to attend & learn more about SBC.

2 Likes

I would also be keen to attend and learn more!

1 Like

I still feel like a novice when it comes to SBC so I would be interested in any tutorials on both the theory and implementation. On the other hand, it would also be useful for showcasing specific applications of SBC (happy to make a poster or short presentation on my own usage, for example).

2 Likes

Final Schedule
Date: 8/31 9am to Noon (EST)

Talks
Graphical test for uniformity and its applications in SBC workflow (Teemu Säilynoja)
Prior Specification in the context of Simulation-Based Calibration (Paul BĂĽrkner)
Workflow techniques for the robust use of Bayes factors (Daniel Schad)
Simulation-based calibration for Bayesian phylogenetics: dealing with huge models and an awkward parameter space (Luiz Max Carvalho)

Schedule (All times EST)
8:30am - 9:00am: Event opens, informal chat/networking
9:00am - 9:05am: Introduction to the Event by SGB representative
9:05am - 9:20am: Opening by Hyunji Moon and Andrew Gelman
9:20am - 10:00am: Teemu Säilynoja and Paul Buerkner’s talk [15-min. talk + 5 min. Q&A]
10:00am - 10:10am: Break, tutorial setup
10:10am - 11:10am: Tutorial (Martin Modrak, Shinyoung Kim)
11:10am - 11:50am: Daniel Schad and Luiz Max Carvalho’s talk [15-min. talk + 5 min. Q&A]
11:50am - 12:00am: Closing

Abstract for the talk:
Opening: Simulation-based data exploration

Prior Specification in the context of Simulation-Based Calibration

Performing simulation-based calibration (SBC) requires repeated sampling from the parameters’ priors and subsequently from the likelihood in its role as data generating distribution. Ideally, we have chosen our priors intelligently so that the resulting simulated data is within a reasonable range and similar in scale to our real-world data. However, in models where parameters are non-linearly related to the response, choosing priors that imply realistically looking data is actually quite hard. Or, to view it from another perspective, a lot of models with weakly-informative priors will imply data that are orders of magnitudes away from anything we would consider realistic. What is more, this may also have negative consequences on convergence and sampling efficiency in the subsequent model estimation. In my talk, I will illustrate these challenges, highlight some potential solutions and point to directions for future research.

Graphical test for uniformity and its applications in SBC workflow

Assessing the uniformity of the rank statistics of the prior draws is a central part of SBC; histogram and empirical CDF are tools used in the original SBC paper. Unfortunately histogram doesn’t take into account the dependency between bin heights and users have to choose the number of bins. Also, Comparing empirical CDF of rank statistics with that of random draws from uniform distribution is suggested.
In our paper, we provide simultaneous confidence bands for the sample ECDF which results in an intuitive graphical test for uniformity. The graphical nature of this test also provides feedback on the nature of the possible deviations from uniformity. Optimization and a simulation based method for adjusting the pointwise confidence bands to obtain simultaneous coverage with a desired type 1 error rate are also presented. In my talk, I briefly introduce our graphical test and demonstrate the test together with the sbc function of rstan can be applied to recognize common deviations from uniformity. I also briefly introduce the other main contribution of our paper which, by extending the simultaneous confidence bands to multiple sample comparison, allows for evaluating whether two or more samples originate from the same underlying distribution. This is especially useful as an alternative for the widely used trace plots and rank plots in assessing the convergence of MCMC chains.

Workflow techniques for the robust use of Bayes factors

It is unknown whether approximate Bayes factor estimates (e.g., using bridge sampling) are unbiased for complex analyses. We use simulation-based calibration as a tool to test the accuracy of Bayes factor estimates. Moreover, we study how Bayes factors misbehave under different conditions and suggest a workflow for the use of Bayes factors.

Simulation-based calibration for Bayesian phylogenetics: dealing with huge models and an awkward parameter space

Phylodynamics applies phylogenetic methods to study the evolutionary and epidemiological dynamics of pathogens and uncover the spatiotemporal patterns for the spread of viruses and bacteria. However, phylogenetic models are highly intractable, which requires the use of approximate sampling methods. In this setting, SBC could be employed to test and calibrate the approximation algorithms. Phylogenetics poses special difficulties to SBC for two main reasons: (i) it includes both discrete and continuous components (ii) there is no canonical representation of trees with well-ordering, and therefore requires a proper projection onto metric spaces for rank computation. In this talk, the main statistical issues in phylogenetic analysis will be discussed with a focus on SBC. Automated analysis from JAVA application and its integration with other packages for further analyses such as plotting will be shown. Joint work with Remco Bouckaert (Auckland).

Thanks, co-organizers and speakers!
@andrewgelman @martinmodrak @Dashadower @paul.buerkner @maxbiostat

10 Likes

The following is the list of support for SBC. For those curious about model checking, prior knowledge on SBC before the conference would never hurt :) Detailed background and FAQ is documented in SBC package readme. Please let me know through reply if there is any missing literature!

Theoretical support

Application support

Vignette

  • ECDF with codes (new implementation by Teemu Säilynoja will be available in bayesplot and SBC package soon)
3 Likes

This is the link for registering StanConnect on August, 31.

Our staff has put great efforts into developing a package and tutorials to introduce ABCs of SBC which could be easily extended to model checking in Bayesian workflow. The followings are ongoing documentation and all feedback is welcome.

  • Bayesian Calibration Series 1 blog
  • SBC FAQ wiki (early draft)

Especially I need help from the Stan community on SBC FAQ. Please feel free to ask any questions on SBC and contribute to FAQ list.

The motivation behind FAQ is SBC’s consistent evolution. Delightful though the development is, changing diagnostics leads to confusion and I received several of questions on the newest SBC version and the reasoning behind its update. Multiple factors such as autocorrelation, interpretation, power and simulation scale, calibration target should be considered for the best use of SBC. There is no one-size-fits answer, but as a person who appreciates the value of prior recommendations in Stan wiki, I thought timely updated recommendations based on the existing literature, communication with SBC frontiers, and my first-hand experiment and research could be helpful.

4 Likes

I couldn’t attend, but would have liked to. Are / will there be recordings of the talks?

I guess that Info should be somewhere, but I couldn’t find it on the events page, nor did i see any general post about StanConnects that explains this in the forum under “events”.

(It doesn’t seem possible to enter the eventbrite event anymore.)

2 Likes

Hi @Raoul-Kima I will upload it by this week and notify you through this thread.

1 Like

Meanwhile, for those who requested recording for the tutorial, the SBC’s basic structure and usage could be seen from here: SBC Interface Introduction • SBC

2 Likes

Great, Thanks!

Video of the conference is uploaded: http://y2u.be/SbgAMkN18dA

1 Like