Bayesian DAG: conditional independencies test (varying intercept model) Rstan

I’m still fairly new to using R and stan. I am trying to build a varying intercept model to test the conditional independencies of a DAG.

The DAG in question:
dag ← dagitty ( “dag {
X → A → B
X → y → C → A → B
X → C → A → B
X → B
}” )
Then I use:


B_||_ C | X, A

B_||_ Y | X, C

B_||_ Y | X, A

A_||_ Y | X, C

Now I have the conditional independencies of the DAG and I now have to test this in a varying/random intercept model. I am however as mentioned new to this and struggling to get started, can anyone point me in the right direction?

Are you interested in the causal effect of X on Y?


Well, C is a collider so if you condition on that it will open up the path to Y from X, and you don’t want that I assume? A and B will not affect Y, as long as you don’t condition on C.

1 Like

I might have understood it wrong, but is not B the collider in this case?
As X → A → B

X → Y → C → A → B

X → C → A → B

X → B

Every point of the DAG ends up in B, but non goes out from B

Yes, B is also a collider, but it has no effect on Y, right?

yes, that is true, I should probably rephrase my question.
I have found the conditional independencies for the DAG as mentioned in the first post.
I now have to build a multilevel model with varying intercepts that tests these 4 conditional independencees.
I am unsure how to start building the model.

Build several models and look at how the estimate of X changes depending on how many predictors you bring in. Next, given your DAG reason about what this means.


That really helped! Thank you!

Hi again, I have not been able to build the model as I’m unsure how to start.

I’ve tried this as a start, but I don’t believe it’s correct.

DAG <- lmer(formula = x ~ 1 + y + (1 | x), 
           data = df, 
           REML = FALSE)

It can depend on what type of data y and x is? And, isn’t y the outcome, i.e., y ~ 1 + x

I would estimate separate models for different implied conditional independencies [1]. If you want to test the implied conditional independence (B \perp\!\!\!\perp C) | X,A , you can estimate the model

B ~ C + X + A and check if the coefficient for C is (close to) zero.

There are, however, complicating factors:

  • the relationship of C and B does not need to be linear (e.g. a spline model could help to check if there is a non-linear association between B and C)
  • one has to determine the region of practical equivalence with zero (ROPE), i.e. decide how close to zero the conditional dependence has to be to be considered zero (as N increases, credible intervals become smaller and it becomes more likely that they to not overlap with zero just due to chance. Related, as N decreases, credible intervals get larger and the chance that they do overlap with zero gets larger, even if B and C are conditionally dependent)
  • alternative to a ROPE approach, one could use a model comparison approach (based on loo-CV), i.e. compare if the model B ~ X + A predicts that data equally well as the model B ~ C + X + A. If the latter model has a better loo-CV value (which comes with standard errors) then the data do not support the conditional independence assumption.

Edit: See also here for additional information.

  1. see e.g. here the section Which DAG is the right one? Checking implied conditional independencies ↩︎