DAG, latent variables and d-separations

Hello!

I have a question I cannot answer about approaching DAG with stan. I have the following graph, with 5 latent variables, one exogeneous variable and 2 responses.

I would like to explore if the d-separations implied by the DAG are supported by my data. I have two constraints: I want latent variables to be estimated jointly with the regression, and I’d like them to be the same for each regression.

Would it make sense to fit the different regressions together in stan? It seems a little bit strange to me, but I could not imagine another way. If we consider for example the d-separations

Fert _||_ GDD | Moss2
Fert _||_ Vasc1
GDD _||_ Vasc1

I would then write a code similar to

transformed parameters{
  // Final loading matrices
  matrix[SF,DF] L_Fert = fill_loadings(DF, SF, L_lower_Fert, L_diag_Fert);
  // Declare linear predictors
  matrix[N,SF] log_mu_Fert = FS_Fert * L_Fert';
}
model{
  // Parameters
  /// Factor scores
  for(d in 1:DF) target += normal_lpdf(FS_Fert[,d] | 0, sigma_FS_Fert[d]);
  //// Fertility
  target += std_normal_lpdf(L_lower_Fert); //Lower diagonal loadings of fertility
  target += std_normal_lpdf(L_diag_Fert); //Diagonal loadings of fertility
// Likelihood for latent variables
    // Fertility
    for(s in 1:SF) target += normal_lpdf(Fert[,s] | log_mu_Fert[,s], sigma_Fert[s]);
// Likelihood for regressions
    // d-separation
    // Fert1 _||_ GDD | Moss2, P, T
    target += normal_lpdf(FS_Fert[,1] | a_Fert_GDD + b_Fert_GDD * sc_GDD + b_Fert_GDD_Moss2 * FS_Moss[,2], sigma_Fert_GDD);
    // Fert1 _||_ Vasc1 | P, T
    target += normal_lpdf(FS_Fert[,1] | a_Fert_Vasc1 + b_Fert_Vasc1 * FS_Vasc[,1], sigma_Fert_Vasc1);
    // GDD _||_ Vasc1 | P, SWC, T
    target += lognormal_lpdf(GDD | a_GDD_Vasc1 + b_GDD_Vasc1 * FS_Vasc[,1] + b_GDD_Vasc1_SWC * sc_SWC, sigma_GDD_Vasc1);
}

I have the intuition something may be wrong, just stacking likelihood contributions for the same variables and same parameters. I could imagine the increase in dimension could make the posterior surface terribly flat, because of the high dimensionality? I guess basic probability maths about joint distribution could help better understand the implication of the approach, but I do not think I am able to understand it by myself.

Has someone some clue about the validity of the approach? Or any idea to evaluate d-separation with latent variables?

Thank you very much, have a great day!
Lucas

EDIT: Added information to the code.

1 Like

I understood!

I was confounding two steps: fitting the DAG and making queries. I tried to obtain un-biased coefficients directly from the regression, which messed up completely the estimation step.

I understand know that we have to estimate the DAG has I defined it, and then

  • Check residual correlations to confirm d-separation
  • Make clever queries to estimate the unbiased causal effects.

Lucas

I think it is fair to test each implied conditional independence with its own regression model.

However, As far as I understand, one can only check implied conditional independencies about observed variables. But conditional independencies in you example seem to involve latent variables, which means that given the observable data the DAG can’t be checked.

Is there something I am misunderstanding?

Of course one can specify a model with a covariance matrix that i forces the implied conditional independencies. But it did sound that this is what you were after.

Hi!

Thank you for your answer!

It is true I have read something about conditional independencies and latent variables… I will check deeper, thank you!

Because stan samples from the joint distribution, I was wondering if the result would really be equivalent to making a series of independant regressions, for example using different stan codes?

When I try to run a code similar to the one above sampling is really long (about 2 hours, vs. a few minutes for latent variables only), and chains get stuck without sufficient exploration, leading to “straight lines” when looking at traceplots.

Once excluded the matrices explained by the Vasc and Moss latent variables, it remains observed variables log-normally distributed and latent variables normally distributed. I was wondering if, after all, I would not try to fill the residual covariance matrix. I have only limited data points though, so I thought about using sparsity inducing priors over the correlation coefficient, something as

\rho_{ij} = tanh(r_{ij})\\ r_{ij} \sim horseshoe

Would it make any sense?
Thank you!
Lucas