Conditional independence claims with categorical and continuous variables in Bayesian multivariate models with brms

jebyrnes · December 17, 2024, 4:11pm

I’m trying to work up how to do a test of conditional independence with categorical variables in a multivariate model with brms. If we have the following set of relationships from palmerpenguins

body_mass_g ~ species
flipper_length_mm ~ body_mass_g + species
bill_length_mm ~ body_mass_g

There are four conditional independence relationships

ggdag::dagify(
    body_mass_g ~ species ,
    flipper_length_mm ~ body_mass_g, 
    bill_length_mm ~ body_mass_g 
) |>
    dagitty::impliedConditionalIndependencies()

bl__ _||_ fl__ | bd__
bl__ _||_ spcs | bd__
fl__ _||_ spcs | bd__

Establishing conditional independence for bill_length_mm || flipper_length_mm given body mass is straightforward enough - fit a model with a link between them, then look at the posterior of the coefficient and - I’ve been looking at the overlap of the ROPE to get a conditional independence test.

But - the species ones are harder, as it’s categorical. In thinking about the problem and looking at the literature on conditional independence, one bit that comes back is that, let’s say we were looking at a conditional independence relationship between Y and X. This implies with likelihood that

L(Y | X,Z) = L(Y|Z)

So, in a Bayesian framework, this would imply that the log posterior of a model with versus without a link should be the same, no? In which case one could evaluate, say, the fraction of the posterior (or log posterior) of the more saturated model overlaps with the 95% compatibility interval of the posterior of the sparse model. Which… is actually nice as it’s a general framework for ANY variable type. And we can do this nicely with bayestestR::overlap()

Example:

###
library(brms)
library(palmerpenguins)
library(ggplot2)
library(patchwork)
library(ggplot2)
library(dplyr)

# model with missing links
penguin_bf <- 
  bf(body_mass_g ~ species) +
  bf(flipper_length_mm ~ body_mass_g) +
  bf(bill_length_mm ~ body_mass_g) +
  set_rescor(FALSE)

penguin_sparse_brm <- brm(penguin_bf,
                          data = penguins)

# test of conditional independence for species
# on flippers
penguin_flipper_species_bf <- 
  bf(body_mass_g ~ species) +
  bf(flipper_length_mm ~ body_mass_g + species) +
  bf(bill_length_mm ~ body_mass_g) +
  set_rescor(FALSE)

penguin_flipper_species_brm <- 
  brm(penguin_flipper_species_bf,
                          data = penguins)

# check the overlap of log posteriors
bayestestR::overlap(  log_posterior(penguin_flipper_species_brm)$Value, 
                      log_posterior(penguin_sparse_brm)$Value)

Would love to have a way to do this without bayestestR to reduce specialty packages. Hrm…

We can also plot to see this

posts <- bind_rows(
  log_posterior(penguin_flipper_species_brm) |>
    mutate(mod = "flipper_species"),
  log_posterior(penguin_sparse_brm) |>
    mutate(mod = "original")) 


ggplot(posts, aes(x = Value, 
              fill = mod)) +
  geom_density(alpha = 0.5)

Note, we could of course also look at posterior checks to see if there is some oddness - which is helpful - but the test is the thing.

penguin_bf$responses |>
  purrr::map(~pp_check(penguin_sparse_brm, 
                       resp = .x) + 
               labs(title = .x)) |>
  purrr::accumulate(`+`)

So, three questions

Valid? Yes? No? Better approach here?
1a) Is the posterior the right target here? We could also look at change in residual sigma… although that might not generalize to other distributions, which is why I like looking at the posterior.
This would have to be done sequentially, as including one path could alter another conditional independence relationship and would influence the posterior. Yes? Although, could a saturated v. unsaturated model provide a global test of conditional independence claims and model structure here? OR
I realize this is touching on the territory of Bayes Factors, but, I’m trying to avoid that for now in order to stay in the realm of evaluating probabilities here. And not bringing in the baggage of Bayes Factors, which I quite like, but know can be problematic. LOO is also good here, but, it’s about prediction, and not model structure.

Anyway - would love to know y’alls thoughts.

saudiwin · December 18, 2024, 12:40am

You should look at the package CausalQueries – it also does DAGs with Stan (CRAN: Package CausalQueries). The also do conditional independence tests, I believe, so you could see how they parameterize it.

Topic		Replies	Views
"conditional dependence" in Bayesian analysis Modeling	3	372	August 12, 2023
Question re brms: conditional_effects General brms	4	1184	July 4, 2022
Multivariate latent variable model (factor analysis) in brms brms brms	3	784	November 28, 2023
Help specifying logistic regressing with nested predictors in a three-way interaction brms specification , cognitive-science	2	788	June 1, 2020
Residual correlation of multivariate model for binary response variable General multivariate-normal , brms	0	43	September 27, 2024

Conditional independence claims with categorical and continuous variables in Bayesian multivariate models with brms

Related topics