Shouldn't I condition for all variables that cause Y in a DAG?

Dan-Zapata · March 8, 2023, 4:13am

Hello everyone,

I hope this is a right place to ask this: I am fairly new at trying to understand my models through the eyes of Causal Directed Acyclic Graphs (DAGs). But I have been trying to wrap my head around it before choosing what variables I want to condition for when running my models and to inform me about potentially relevant variables before collecting my data.

My confusion is perhaps too naive, but here it goes. I have read that when choosing what variables to condition for, those that are ancestors of both X (target effect) and Y (response variable) should be conditioned for, in order to get an unbiased estimate. Thus, if X → Y, and X ← Z → Y one should condition for Z to get an unbiased estimate of the effect of X on Y. My question regards the case in which Y is the descendant of more than one parent. In a simple scenario in which the only arrows present are those of W, X, and Z directed to Y, but where I am specifically interested in the effect of X on Y, shouldn’t I also condition for W and Z? Wouldn’t I get a biased estimate if I do not include W and Z in my model given that they -like X- are affecting Y? That is, shouldn’t one add all the variables that cause Y? And if not, why?

I would appreciate any insight anyone could provide me.

Yours,

D.

torkar · March 8, 2023, 9:14am

I think it would be good if you draw some diagrams :)

AWoodward · March 8, 2023, 12:09pm

I’m always keen to learn more about causal reasoning so here goes;

If your system is X→Y and X ← Z → Y, then Z is a confounder of X→Y, so in any model for X→Y, Z → Y must also be estimated because otherwise X→Y will be biased; even if the direct causal effect X→Y were 0, the pathway W→ X→ Y remains.

If your system also includes W→Y, then inclusion of W→Y in your model is not necessary to prevent bias in X→Y. If X→Y were 0 and X→W is also 0, then you expect the pathway W→ X→ Y to be zero. However, inclusion of W→Y may still be relevant, because the estimation of W→Y will improve the precision of X→Y. If W→Y is non-zero, then excluding it would add noise to X→Y, but X→Y will not be biased.

dagitty-model1

jflournoy · March 9, 2023, 6:40pm

I’ve found this paper very helpful when thinking through options for what variables to include:

Cinelli, C., Forney, A., & Pearl, J. (2022). A Crash Course in Good and Bad Controls. Sociological Methods & Research, 0(0). https://doi.org/10.1177/00491241221099552 (pdf)

LucC · March 10, 2023, 5:51am

+1, although one important “but” is missing: so-called backdoor confounding caused by “controlling” for a collider, i.e., a third variable that is influenced by X and Y and which doesn’t require to be conditioned on. If you do, though, the estimated effect of X on Y will be biased.

Dan-Zapata · March 11, 2023, 9:57pm

Ok, I see. So In this DAG, controlling for Z gives me an unbiased estimate, but controlling for those variables that are a cause of Y but not X improves the precision of my estimate on the effect of X on Y. I imagine this would be reflected by the model is a greater standard deviation.

Thank you very much for you response!

saudiwin · March 13, 2023, 8:45am

I highly recommend using dagitty (and there are other packages available) to tell you which variables to adjust for given a causal DAG: http://www.dagitty.net/

Topic		Replies	Views
Conditioning despite post-treatment/collider bias brms techniques , specification , brms	2	384	April 3, 2023
Brms::conditional_effects plots all 2-way interactions General conditional-effects	2	1028	September 3, 2021
"conditional dependence" in Bayesian analysis Modeling	3	373	August 12, 2023
Multiple-outcome causal model in brms brms specification , multivariate-normal , brms	3	977	March 6, 2023
Brms: Mediation analysis or not? brms	6	1696	December 7, 2019

Shouldn't I condition for all variables that cause Y in a DAG?

Related topics