# Bayesian workflow: GPS to Navigation

I wish to bounce off some ideas on Bayesian workflow. GPS provides the raw data about a location, navigation systems use this data to guide users to their destinations. Diagnostics on Bayesian workflow allows construction of GPS in model space. How can the typifying paths idea (below) be improved to build good navigation system that provides each modeler localized decision support? I’m curious about relevant literature (e.g. Bayesian workflow casestudies with real data & specific application) that can help me flesh out navigation idea to provide documentation complementing SBC package vignettes e.g. Small model implementation workflow • SBC. Looking forward to learn from any inconsistencies between our internal world models!

As a modeler explores model space with Bayesian workflow map, which among joint probability p(y,\theta) (P), approximator p_A(y|\theta) (A), data \tilde{y} (D) can he/she decide to update in which sequence based on what signal? Definition and example of P, A, D are in @paul.buerkner, @scholz, Stefan Radev’s paper (which I highly recommend!) and for simplicity I divided this journey into decision node and paths.

Basic idea is a modeler’s journey is a sequential decision problem where signal at each time is high dimension diagnostics and actions are updating P or A or D. This naturally led to the question: can we typify paths to reach the state of good enough diagnostics? For instance, building on Birthday problem, a time series model for numbers of birth per day, let’s say a modeler start with a simple slow trend model (P) with optimization approximation algorithm (A) with birthday per day in USA from 1969-1988. After some modeling, he/she retrieved[1] good enough community-recommended diagnostics (prior predictive check, simulation based calibration, posterior predictive check) using a time series model with long term, seasonal, weekly, day of year, and special floating day variation (P’), Markov chain approximation algorithm (A’), birthdays per year in USA from 1969-2018. However, there exists at least 20 paths (excluding give-up at each decision) that this modeler could have gone though.

## 3 types of decision

Q1. prior predictive check diagnostics?

• good enough: go to step 2

Q2. simulation-based calibration diagnostics?

• bad: update P or A
• good enough: go to step 3

Q3. posterior predictive check diagnostics?

• bad: update P or A or D
• good enough: finish

good enough prior predictive check diagnostics (ppc1) := range of simulated observed data based on prior is not extreme: irrejectable.

good enough simulation based calibration (sbc) diagnostics:= ecdf-based plots is within confidence band with alpha for every test quantities (t))

good enough posterior predictive check diagnostics (ppc2) := range of simulated based on posterior is acceptable for every quantities of interest (observed data, utility)

Subjectivity in “good enough” diagnostics makes this a decision problem. Bars would be high in pharmaceutical and defense industry where human’s life is at stake. And even in the same industry, company with enough resource (computation, time, human capital) would have higher bars, and even in the same company, bar can be dragged down as project deadline approaches.

## >20 types of path

• 1b_P (1): update P to retrieve good enough ppc1 diagnostics
1. good enough ppc1, bad sbc
• 1g2b_P, 1g2b_A (2): update P or A to retrieve good enough sbc

• 1g2b_PA, 1g2b_AP (2): update either P or A gives bad sbc, update P-A or A-P to retrieve good enough sbc

1. good enough ppc1 and sbc, bad ppc2
• 1g2g3b_P, 1g2g3b_A, 1g2g3g_D (3): update P or A or D to retrieve good enough ppc2

• 1g2g3b_PA, 1g2g3b_AP, 1g2g3b_PD, 1g2g3b_DP, 1g2g3b_AD, 1g2g3b_DA (6): bad ppc2 after updating one, retrieve good enough ppc2 after updating two

• 1g2g3b_PAD, 1g2g3b_PDA, 1g2g3b_ADP, 1g2g3b_APD, 1g2g3b_DAP, 1g2g3b_DPA (6): bad ppc2 even after updating two, retrieve good enough ppc2 after updating all three

tagging some workflow enthusiasts! @andrewgelman @avehtari @Bob_Carpenter @betanalpha @martinmodrak @paul.buerkner @spinkney @mike-lawrence

Thank you.

1. retrieve in the sense that default is good enough diagnostics and learning happens in the process of resolving inconsistencies between internal world model (P), tools to implement models (A), sensed external world (D). ↩︎

3 Likes

This is similar to the problem Paul and me were looking into that ultimately led to the model taxonomy paper you linked as we felt we didn’t even have proper vocabulary to talk about possible steps in a somewhat formalized way. I’ll collect some thoughts after the weekend :)

2 Likes

Hi Maximilian (do you go by Max by any chance?),
Thanks for your work that enabled defining states on which we can communicate!

So as I wrote before, this is going in a similar direction as what we (Paul and me) explored for my thesis project but written down in formal way. Something that we didn’t explicitly consider back then was the hierarchy you use (only moving on after each type of check passes), which is probably a useful addition for a workflow.
From a user’s perspective however, this workflow left similar questions open as the big workflow flowchart did: “What do I do in each individual node?”. Which, if we could answer it with some kind of algorithm would make statisticians obsolete. But the space of possible actions for each path node seems somewhat infinite or at least not searchable in a practical way and leaves us with a pile of decision problems we can’t answer.

So while there is value in laying out a high-level workflow as a guide, I think the users’ problems just start here. From here, sub-workflows would have to be identified and then we either offer procedures to successfully do said sub-workflow or vaguely point at experience statisticians to please help us with it.
My own project is approaching this problem and after a few years I am not yet convinced that there can be a solution outside of very limited problems.

I think useful next steps would be to take a sub-workflow/path node and try to solve it for the simplest possible case that is still relevant for practice and, ideally, has the potential to generalize.

do you go by Max by any chance?

I tend to in less professional contexts or with friends :)

1 Like