Estimating difference in effect of a predictor on different (but similar) outcomes in a mutlivariate regression

I want to estimate the difference in association between a single predictor and two different outcomes with the same form. As an example take this dataset. One numeric predictor and two binary no/yes outcomes.

exDF.RData (9.6 KB)

Here is the model I propose running, a multivariate bernoulli. Apologies all non R, non tidyverse users

library(brms)
library(posterior)
library(tidyverse)

# load data
load(file = "exDF.RData") # object name exDF

# create multivariate formula, regressing the two outcomes on the single predictor
bform_mvr <- bf(mvbind(binary1, binary2) ~ pred)

# run model in brms
mvrMod <- brm(formula = bform_mvr,
              family = bernoulli,
              data = exDF,
              seed = 1234,
              refresh = 0,
              chains = 4,
              warmup = 1e3,
              iter = 3e3)

# extract parameter estimates
draws <- merge_chains(as_draws_df(mvrMod))

# inspect chains
head(draws)

# A draws_df: 6 iterations, 1 chains, and 8 variables
#   b_binary1_Intercept b_binary2_Intercept b_binary1_pred b_binary2_pred Intercept_binary1 Intercept_binary2 lprior  lp__
# 1                0.45                0.54          -0.19          -0.27             -0.35             -0.59   -3.9 -1865
# 2                0.45                0.54          -0.19          -0.27             -0.35             -0.59   -3.9 -1865
# 3                0.49                0.88          -0.20          -0.34             -0.37             -0.56   -3.9 -1866
# 4                0.93                0.71          -0.29          -0.32             -0.27             -0.62   -3.9 -1866
# 5                0.96                0.51          -0.31          -0.25             -0.33             -0.54   -3.9 -1865
# 6                0.53                0.53          -0.22          -0.29             -0.40             -0.67   -3.9 -1864

So we have an intercept and a b estimate for the predictor for each of the two outcomes, binary1 and binary2. Now what I want to do is estimate whether the effect of pred differs across the two outcomes. Now technically I can do this as I have some markov chains to play with: At each draw from the posterior I could subtract the binary2 b coefficient from the binary1 b coefficient, yielding a difference, then calculate the quantiles for the median and 95% credibility interval of those differences, like so

draws %>%
  mutate(diffPred = b_binary1_pred - b_binary2_pred) %>%
    select(diffPred, b_binary2_pred, b_binary2_pred) -> diffDF

#
diffDF %>%
  reframe(est = quantile(diffPred,
                         probs = c(0.5, 0.025, 0.975)),
          iqr = c("median", "lowCI", "hiCI")) %>%
    pivot_wider(names_from = iqr,
                values_from = est)

# output
# A tibble: 1 Ă— 3
#      median  lowCI  hiCI
#       <dbl>  <dbl> <dbl>
#   1 0.00614 -0.120 0.136

# graph the histogram (with skyblue fill as homage to Kruschke)
diffDF %>%
  ggplot(mapping = aes(x = diffPred)) +
         geom_histogram(fill = "skyblue")

Looks like no real difference.

But what I want to ask the forum is is it kosher STATISTICALLY to estimate the difference in associations of a single predictor between two outcomes of a similar form?

I’ve done interactions many times, but that’s the associative relationship between two (or more) predictors on a single outcome. I am very new to multivariate models.

I’m not sure what you mean by “kosher statistically”, but I’m pretty sure the answer is “yes”.

That is to say, if you have two outcomes with the same units, and a single predictor, it’s a perfectly well posed question to ask whether the slope of one regression is bigger than the slope of the other.

The procedure you suggest is valid as long as its assumptions are (close enough to) met. In addition to the regular assumptions that apply to the regressions independently, the other assumption that you’re making here is that the outcomes are conditionally independent–i.e. independent of one another conditional on the value of the predictor. This assumption of independence in many cases could be relaxed via correlated observation-level random effects, or correlated residuals, but in the bernoulli case this is likely to yield a degenerate model.

3 Likes

Thank you @jscolar. You actually knew exactly what I meant by statistically kosher. Much appreciated. And yes working out how to allow correlations between outcome residuals is my next port of call. Not possible in brms multivariate at the moment so will involve cracking open the black box and rewiring a few things.

8 posts were split to a new topic: Whether to preserve posterior correlations in downstream inference

One easy solution here is to combine your two binary outcomes into one categorical outcome, them use the categorical distribution. You can then create the marginal parameter estimates for each outcome from the draws of the categorical parameters.

You will see very quickly what @jsocolar is saying here as far as the correlations in parameter values.

Thank you @simonbrauer. These are non-mutually exclusive outcomes so creating a single categorical variable is less appropriate I think. Doable with two binary outcomes, you’d get a four-level categorical - [0,0], [0,1], [1,0], [1,1]. Output would be pretty confusing but just manageable. Once you get beyond two binary non-mutually-exclusive variables, collapsing into a single outcome becomes impractical. I like the multivariate approach. Providing you can model the correlations it is much simpler to explain to readers.