Hi
I’m still fairly new to using R and stan. I am trying to build a varying intercept model to test the conditional independencies of a DAG.
The DAG in question:
dag ← dagitty ( “dag {
X → A → B
X → y → C → A → B
X → C → A → B
X → B
}” )
Then I use:
“impliedConditionalIndependencies(dag)”
B_||_ C | X, A
B_||_ Y | X, C
B_||_ Y | X, A
A_||_ Y | X, C
Now I have the conditional independencies of the DAG and I now have to test this in a varying/random intercept model. I am however as mentioned new to this and struggling to get started, can anyone point me in the right direction?
Well, C is a collider so if you condition on that it will open up the path to Y from X, and you don’t want that I assume? A and B will not affect Y, as long as you don’t condition on C.
yes, that is true, I should probably rephrase my question.
I have found the conditional independencies for the DAG as mentioned in the first post.
I now have to build a multilevel model with varying intercepts that tests these 4 conditional independencees.
I am unsure how to start building the model.
Build several models and look at how the estimate of X changes depending on how many predictors you bring in. Next, given your DAG reason about what this means.
I would estimate separate models for different implied conditional independencies [1]. If you want to test the implied conditional independence (B \perp\!\!\!\perp C) | X,A , you can estimate the model
B ~ C + X + A and check if the coefficient for C is (close to) zero.
There are, however, complicating factors:
the relationship of C and B does not need to be linear (e.g. a spline model could help to check if there is a non-linear association between B and C)
one has to determine the region of practical equivalence with zero (ROPE), i.e. decide how close to zero the conditional dependence has to be to be considered zero (as N increases, credible intervals become smaller and it becomes more likely that they to not overlap with zero just due to chance. Related, as N decreases, credible intervals get larger and the chance that they do overlap with zero gets larger, even if B and C are conditionally dependent)
alternative to a ROPE approach, one could use a model comparison approach (based on loo-CV), i.e. compare if the model B ~ X + A predicts that data equally well as the model B ~ C + X + A. If the latter model has a better loo-CV value (which comes with standard errors) then the data do not support the conditional independence assumption.
see e.g. here http://htmlpreview.github.io/?https://github.com/gbiele/MultipleBiases/blob/master/sim_conf_select_bias.html the section Which DAG is the right one? Checking implied conditional independencies↩︎