Upcoming Short Course on Bayesian Causal Inference with Stan

Hi all,

I wanted to publicize a short course I’ll be teaching on topics related to Bayesian causal inference. See below for course title, abstract, outline, and other information.

The course will be held in-person at the American Causal Inference Conference (ACIC) 2026 meeting in Salt Lake City, Utah, USA from 1-5pm local time on Monday, May 11.

Conference registration is required, more information here: 2026 Meeting – SOCIETY FOR CAUSAL INFERENCE
More about me here: https://stablemarkets.netlify.app/


Course Title:
Stress-Testing Assumptions: Bayesian Methods for Sensitivity Analysis in Causal Inference

Course Abstract:
Observational studies are often conducted to estimate causal effects of biomedical treatments. These methods invariably rely on statistical and causal identification assumptions. The former are required in settings with imperfectly observed data (e.g. measurement error, missing data, etc.) to connect the complete data distribution to the observed data distribution. The latter are required to connect the observed data distribution to the distribution of potential outcomes. In general, both sets of assumptions are untestable.

When these assumptions do not hold, Bayesian sensitivity analyses allow us to formally encode subjective beliefs about the violation structure via prior distributions. Causal inferences are made using an updated posterior that reflects uncertainty about these violations. Moreover, nonparametric approaches allow the data to drive posterior beliefs about identifiable aspects of the model, while letting priors drive posterior beliefs for the non-identifiable aspects.

This course teaches the methodological and computational concepts behind Bayesian sensitivity analyses. We cover several examples in point-treatment settings including treatment misclassification, unmeasured confounding, and missing not-at-random outcomes. We walk through implementation using synthetic data with computing done in Stan – a widely-used publicly available platform for fitting Bayesian models. Parametric and nonparametric models are covered.


Course Outline:
The course is comprised of three parts:

Part 1 – Bayesian causal estimation in an ideal point-treatment setting where we have completely observed data and the usual causal assumptions hold. Key concepts covered include:
• Basics of Bayesian inference: priors, likelihoods, and posteriors.
• Bayesian implementation of the g-formula.
• Basics of the Stan programming language.
• Implementation example of the g-formula in Stan.

Part 2 – We build on Part 1 to allow for assumption violations. Implementation in Stan is discussed throughout. Specifically, we discuss the following examples of sensitivity analyses for:
• Unmeasured confounding.
• Exposure/treatment misclassification.
• Incomplete outcome information – specifically values that are missing not-at-random in both treatment arms.
Part 3 – While previous parts use parametric Bayesian models, Part 3 will teach participants the basics of sensitivity analysis with nonparametric Bayesian models. Key concepts covered include:
• Infinite and truncated Dirichlet process mixtures.
• Data augmentation concepts.
• A Stan example of conditional average treatment effect (CATE) estimation using truncated Dirichlet Processes with missing not-at-random outcomes.


Course Learning Objectives:
Participants can expect to leave the course with the following:

  1. Understanding of Bayesian inference for causal effects.
  2. Understanding of the general Bayesian framework for sensitivity analysis.
  3. Understanding of concrete implementation in Stan.
  4. References to important foundational papers and textbooks in this area.

The ideal participant has: 1) familiarity with causal inference in point-treatment settings using outcome and treatment-modeling approaches within the frequentist paradigm; 2) understanding of probability at the level of an introductory graduate course; 3) facility with the R programming language.

Prior exposure to the following are helpful but not necessary: 1) facility programming in languages that are strong and statically typed with an object-oriented paradigm (e.g. C++, Java); 2) familiarity with Bayesian inference at the level of an introductory graduate or advanced undergraduate level.

5 Likes

Sounds great!

1 Like

Thanks for posting.

I don’t know much about causal inference, so I had to look up “g-formula” (the “g” turns out to be for “generalized”, so even if you’d spelled I tout I would have had to look it up!). It sounds like a method for simulating potential outcomes.

How did you find fitting DPs in Stan? Presumably identifiability isn’t a problem as you want to do posterior predictive inference w/o the latents.

1 Like

g-formula is a shortcut computation and in one of the g-formula review papers they say that when the model get complex enough you just have to simulate outcomes using the full model, which is what Bayesians would do also with simple models.

2 Likes

In general I’ve had an easy time fitting DP mixtures in Stan via the constructive stick-breaking representation - with the minor modification that the stick-breaking process must be truncated at some finite K, rather than the dull DP which would correspond to K=\infty.

You can specify the Beta(1, \alpha) prior on the stick-breaking locations and then compute the stick-breaking weights in the transformed parameters block. The mixture likelihood is then specified using the usual log_sum_exp function for stability.

Here’s an image of the regression from a Bayesian methods course I teach at Brown (also using Stan)

I’m assuming you’re referring to the label switching - right it isn’t an issue because we just want the regression function - we are not interested in clustering or learning cluster-specific parameters. The mixture is just used as a device to get a flexible functional form for E[Y|X]


Regarding g-formula:

Causal effects are defined in terms of potential outcomes, not observed outcomes. The g-formula is an identification result that establishes a mapping between functionals of the potential outcomes to functionals of the observed data distribution. It is not really a “statistical” result and is more akin to identification results that link the complete-data distribution to the observed-data distribution in more general missing data problems.

Statistics is done post-hoc to learn the parameters governing the observed-data distribution. If you’re a Bayesian, you do this via a prior and likelihood.

Evaluation of the g-formula does not involve imputation of missing potential outcome - this is a common misconception. Unnecessarily doing imputations can yield credible intervals that are too wide. I have a paper here with associated GitHub repository containing Stan implementations: https://onlinelibrary.wiley.com/doi/10.1002/sim.8761 [2004.07375] A Practical Introduction to Bayesian Estimation of Causal Effects: Parametric and Nonparametric Approaches

2 Likes

Thanks for sharing.

I believe that’s what we recommend in the User’s Guide. This approximation should be very close if you can bound the number of clusters ahead of time.

And thanks for the tips about the g-formula.

I think so. There’s also some theoretical support from Theorem 2 of Ishwaran and James, 2001 which provides a bound on L_1 distance between the truncated and infinite process. Relevant excerpt:

2 Likes