Unmeasured confounding


I have a general modelling question about unmeasured confounding. As I work almost exclusively with observational data, I have to work under the assumption that there are potentially strong confounders which I have not measured and included in my model. There are methods available to quantitatively investigate that assumption. For example, the sensemakr package ( doi.org/10.1111/rssb.12348) has a neat implementation of how estimates of the model change under different assumptions of unmeasured confounding strength. The only issue is that it works only with linear models. This brings us back to the issue of working with messy observational data that benefits from modelling the data generating mechanism as more complex than a linear gaussian regression.

Ideally I would not sacrifice the modelling flexibility of brms that I know and love in order to understand how strong an unmeasured confounder has to be to meaningfully change the interpretation of my estimates. Does anyone have an accessible strategy for how to incorporate that type of sensitivity analysis in brms, bonus points if I can postpone learning how to write raw STAN code for a bit longer.


It’s in raw stan, but @stablemarkets and Jason Roy has a nice article where they discuss a possible sensitivity analysis ([2004.07375] A Practical Introduction to Bayesian Estimation of Causal Effects: Parametric and Nonparametric Approaches), section 5. (code in appendix C)


Thanks for the tag @adlauretig. BTW the more updated version can be found here A practical introduction to Bayesian estimation of causal effects: Parametric and nonparametric approaches - Oganisian - 2021 - Statistics in Medicine - Wiley Online Library arxiv version has some minor errors and doesn’t incorporate reviewer comments.

Violations of the no unmeasured confounding assumptions can be expressed in terms of non-identifiable parameters. They are non-identifiable in the sense that the likelihood doesn’t inform the parameter at all - we can never rule out or “detect” unmeasured confounding with observed data.

So the general approach is to 1) formalize the structure of your violation in terms of such non-identifiable parameters. 2) place a prior on those parameters that reflect your belief about the direction/magnitude of that violation. 3) find the posterior of the causal estimand. This posterior has uncertainty in the violation baked in via the prior on the non-identifiable parameter.

This approach is mathematically identical to the Bayesian approach for sensitivity analyses in missing data. There, the equivalent assumption is missing-at-random, but otherwise the math is the same. See Daniels & Hogan " Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis" I believe chapter 8. and “handbook of missing data methodology” chapter 5.


Thank you both for the guidance. I guess its as good time as any to get into raw stan now that I have a specific problem I want to solve at hand!

It’s a lovely paper to work through and the complementing code helps a lot to understand what is going on in practice. And the comparison to missing data sensitivity analysis makes a lot of sense in terms of there is a similar unidentifiable parameter that might be biasing the estimate. I am marking the thread solved and refer any future interest in the topic to the paper. I look forward to how software development might make the task even simpler, maybe by having users just supply plausible priors and get the draws from the model as output, thus reducing the need to go into raw stan.

Warm thanks again!


As the journal version is not open access ($49), it would be useful to update the arxiv version, too, as the arxiv version is now the most conveniently available version. Based on the arxiv version, it looks like a very useful paper

1 Like

I agree! It’s been on my to-do list for a while but I never seem to get around to it :).