Quantifying differences between datasets using marginal means

I am seeking for some brief “peer-review” with the following method:

I would like to quantify differences in subjects’ behavior between two different datasets. I want to estimate what is the overall (intercept) difference (on the scale of the outcome, response) and the difference of the effect of a variable T1 and T2 (both binary, [0; 1]) that changed within subject.

I therefore estimated a model with brms on all data and included dataset [“A”; “B”] as a variable, and subject [1, 2, 3, … N_dataset] as a random effect. (I include T2 only through an interaction with T1 since it physically only made sense when T1 = 1.)

model <- brm(
    formula = response ~ 0 + dataset + dataset:T1 + dataset:T1:T2 + (1|dataset:subject),
    family = lognormal(),
    data = data
)

As said, I want to estimate the difference on the response scale between the two datasets, for the intercept, T1 and T2. For this, I am using the emmeans functionality from the tidybayes package. I want to include the uncertainty arising from different subjects, therefore, I use the re_formula and allow_new_levels options.

For the (overall) difference in intercept, I use emmeans:

model %>%
  emmeans(specs = "dataset",
          epred = TRUE,
          re_formula = NULL, 
          allow_new_levels = TRUE) %>%
  contrast(method = "pairwise")

To capture the difference in the effect of T1 on response, I use emtrends:

model %>% 
  emtrends(specs = "dataset", 
           var = "T1",
           epred = TRUE,
           re_formula = NULL, 
           allow_new_levels = TRUE) %>% 
  contrast(method = "pairwise")

For T2, I use the option at = list(T1 = 1) to restrict the analyses to that specific case (explained above):

model %>% 
  emtrends(specs = "dataset", 
           var = "T2",
           at = list(T1 = 1),
           epred = TRUE,
           re_formula = NULL, 
           allow_new_levels = TRUE) %>% 
  contrast(method = "pairwise")

Does this sound like a sensible method? Any thoughts are highly appreciated.

  • Operating System: Windows 10
  • brms Version: 2.20.4

Anyone who might have thoughts? :-)

@rvlenth perhaps?

There are more sophisticated approaches for this. Look up the literature on the Bayesian hierarchical model / historical data priors ([normalized] power prior, commensurate prior, meta-analytic predictive priors, latent exchangeability priors). I have an R package called hdbayes that uses cmdstanr and implements all of these.

1 Like

I can’t comment on how well this fits with Bayesian methodology, versus other possibly more appropriate alternatives. But I would say that the code examples shown would produce reasonable answers.

That said, I have become increasingly allergic to the idea of piping one result into another until we get a final answer. If we run any of these codes as written, we obtain comparisons of means or slopes, but we never see estimates of the means or the slopes themselves that are being compared. Wouldn’t we want to know those? So in all of these, I would keep the steps:

EMM <- emmeans(model, "dataset", etc)
confint(EMM)
contrast(EMM, "pairwise")

By the way, these estimates are made at the mean T1 and the mean T2, and it may be better to use at ro specify particular values of these variables.

1 Like

Thanks very much for the thoughts.

Thanks for pointing that out. I guess that means that a 0/1-coded variable would be averaged, too, which may not make sense here since T1 and T2 are categorical by nature.