Predictions from univariate models differ from those of a single multivariate model

I am conducting a multivariate ordinal regression examining the effect of number of days of amphetamine-type-stimulant use in the 28 days prior to starting treatment (variable ats_baseline ) on three outcomes: psychological health, physical health, and quality of life, all measured on a 0-10 integer scale (variables psych, phys, qol ) during the first years of treatment (variable yearsFromStart , a continuous variable between 0 and 1). Data are grouped by encounter (variable encID ) which defines a block of treatment. For our purposes this can be considered equivalent to a participant ID number. The minimum number of measurements per encID is two. All encounters have the three outcomes measured at start of treatment (i.e. yearsFromStart = 0) and at least one other time. When these non-start-of-treatment-measurements occur is irregular. Here are the data.

In response to this post @amynang gave me a helpful tip how to extract the predictions for each of the outcomes in isolation, via the response = foo argument in the add_epred_draws function from the tidybayes package.

That seemed to work, but when I checked the predicted values against the corresponding univariate analysis (i.e. with only one outcome) the results look quite different. In addition all three sets of predictions across all three variables look, well, identical. I am wondering if I have done something wrong somewhere in my workflow, which I reproduce below. There’s a lot of code here but it should run with a copy paste. Warning these models take a while to run, but once they run the rest of the code runs quickly.

Note: Because this is an ordinal analysis we have to change the scale of the outcomes variables from 0-10 to 1-11, and then for the predictions we subtract that 1 to get the predictions on the original scale.

Here’s the dataset described above,
atsUseInOTP_alloutcomes_noMissing_inf.RData (17.3 KB)

# load environment
library(tidyverse)
library(magrittr)
library(lubridate)
library(haven)
library(modelr)
library(loo)
library(brms)
library(rstan)
library(posterior)
options(posterior.num_args=list(digits=2))
library(pillar)
library(ggplot2)
library(bayesplot)
library(tidybayes)
library(ggdist)
set1 <- RColorBrewer::brewer.pal(3, "Set1")

# load data
load(file = "data/atsUseInOTP_alloutcomes_noMissing_inf.RData") # dInf
names(dInf)

# check ranges of outcomes
dInf %>%
  ungroup %>%
    reframe(across(.cols = psych:qol,
                   .fns = ~range(.x, na.rm = T))) 

# for ordinal analysis we need to transform outcome vars to positive integers
dInf %<>%
  mutate(across(.cols = psych:qol,
                .fns = ~as.integer(.x+1)))

# # check ranges of outcomes
dInf %>%
  ungroup %>%
  reframe(across(.cols = psych:qol,
                 .fns = ~range(.x, na.rm = T))) 

#### run univariate models

# psychological health - univariate
fit_ordinal_psych <- brm(formula = psych ~ ats_baseline*yearsFromStart + (1 | encID),
                        family = cumulative("probit"),
                        data = dInf,
                        refresh = 0,
                        chains = 4,
                        warmup = 1e3,
                        iter = 3e3,
                        refresh = 0)

# physical health - univariate
fit_ordinal_phys <- brm(formula = phys ~ ats_baseline*yearsFromStart + (1 | encID),
                         family = cumulative("probit"),
                         data = dInf,
                         refresh = 0,
                         chains = 4,
                         warmup = 1e3,
                         iter = 3e3,
                         refresh = 0)

# quality of life - univariate
fit_ordinal_qol <- brm(formula = qol ~ ats_baseline*yearsFromStart + (1 | encID),
                        family = cumulative("probit"),
                        data = dInf,
                        refresh = 0,
                        chains = 4,
                        warmup = 1e3,
                        iter = 3e3,
                        refresh = 0)

# all three outcomes - -multivariate
bform_ord_mvr_ppq <- bf(mvbind(psych, phys, qol) ~ ats_baseline*yearsFromStart + (1|p|encID))

fit_ordinal_mvr_ppq_ao_nm <- brm(formula = bform_ord_mvr_ppq,
                                 family = cumulative("probit"),
                                 data = dInf,
                                 seed = 1234,
                                 refresh = 0,
                                 chains = 4,
                                 warmup = 1e3,
                                 iter = 3e3)

Now we graph predicted trajectories via draws from the expected value of the posterior distribution (also known as the conditional expectation). note we are subtracting 1 from the predictions to get them back on the original scale

# first define a function for the graphs

predOrdDrawsFunct <- function(mod, outLabel) {

  # get predicted trajectories of each of the five candidate levels of ATS use at start of treatment
  # at each of forty timepoints in the first year
  tibble(encID=688, set=28) %>%
    data_grid(yearsFromStart = seq(0,1,length.out=40),
              ats_baseline=c(0,7,14,21,28),
              encID,
              set) %>%
      add_epred_draws(object = mod,
                      allow_new_levels = TRUE) %>%
        group_by(yearsFromStart, ats_baseline, encID, .draw) %>%
          summarise(weightMeanPerDraw = weighted.mean(x = as.numeric(.category), 
                                                      w = .epred)) %>%
            ungroup %>%
              mutate(weightMeanPerDraw = weightMeanPerDraw - 1) -> predOrdDraws
  
  # get median point estimates and CIs for each of the  
  predOrdDraws %>%
    group_by(ats_baseline, yearsFromStart) %>%
      reframe(iqr = quantile(x = weightMeanPerDraw,
                             probs = c(0.025, 0.5, 0.975)),
              quantiles = names(iqr)) %>%
        pivot_wider(names_from = quantiles,
                    values_from = iqr) %>%
          rename(median = "50%",
                 lowCI = "2.5%",
                 hiCI = "97.5%") %>%
            relocate(median,
                     .after = yearsFromStart) %>%
              mutate(ats_baseline = factor(ats_baseline)) -> summaryStats
  
  # make graph with confidence envelopes
  summaryStats %>%
    ggplot(mapping = aes(x = yearsFromStart,
                         y = median, 
                         colour = ats_baseline,
                         fill = ats_baseline)) +
          geom_line() +
          geom_ribbon(mapping = aes(ymin = lowCI,
                                  ymax = hiCI),
                      alpha = 0.1,
                      colour = NA) + 
          coord_cartesian(y = c(0,10)) +
          labs(x = "Years from start of treatment",
               y = paste0("ATOP ", outLabel),
               title = paste0(outLabel, " univariate")) +
          theme_minimal() +
          scale_colour_manual(labels = c("0 days",
                                         "7 days",
                                         "14 days",
                                         "21 days",
                                         "28 days"),
                              values = c("#440154FF",
                                         "#3B528BFF",
                                         "#21908CFF",
                                         "#5DC863FF",
                                         "#F4E112FF"),
                              name = "Baseline ats use: days/28") +
          scale_fill_manual(labels = c("0 days",
                                       "7 days",
                                       "14 days",
                                       "21 days",
                                       "28 days"),
                            values = c("#440154FF",
                                       "#3B528BFF",
                                       "#21908CFF",
                                       "#5DC863FF",
                                       "#F4E112FF"),
                            name = "Baseline ats use: days/28") +
          coord_cartesian(ylim = c(0,10)) +
          scale_y_continuous(breaks = seq(0,10,2),
                             expand = expansion(mult = 0)) +
          scale_x_continuous(breaks = seq(0,1,0.25),
                             expand = expansion(mult = 0.03)) +
          theme(title = element_text(size = 12, 
                                     family = "sans",
                                     face = "bold"), 
                legend.text = element_text(size = 12, 
                                           family = "sans"),
                legend.title = element_text(size = 12, 
                                            family = "sans",
                                            face = "bold"),
                axis.text = element_text(size = 12, 
                                         family = "sans"),
                axis.title = element_text(size = 12, 
                                          family = "sans",
                                          face = "bold"),
                axis.line = element_line(colour = "gray20"),
                panel.grid.minor = element_blank()) -> ppl_final
  
  return(ppl_final)
} # end function

# now create graphs
predOrdDraws_psych <- predOrdDrawsFunct(mod = fit_ordinal_psych,
                                        outLabel = "psychological health") -> psychG

predOrdDraws_phys <- predOrdDrawsFunct(mod = fit_ordinal_phys,
                                        outLabel = "physical health") -> physG

predOrdDraws_qol <- predOrdDrawsFunct(mod = fit_ordinal_qol,
                                      outLabel = "quality of life") -> qolG

# Now for the multivariate

# define a function

predOrdDrawsFunct_mvr <- function(mod, outVar, outLabel) {

# posterior predictions. once again subtracting 1 from each prediction to put it back on scale of data
tibble(encID=688, set=28) %>%
  data_grid(yearsFromStart = seq(0,1,length.out=40),
            ats_baseline=c(0,7,14,21,28),
            encID,
            set) %>%
    add_epred_draws(object = mod,
                    allow_new_levels = TRUE,
                    response = outVar) %>% # this isolates the draws from each outcome
      group_by(yearsFromStart, ats_baseline, encID, .draw) %>%
        summarise(weightMeanPerDraw = weighted.mean(x = as.numeric(.category), 
                                                    w = .epred)) %>%
          ungroup %>%
            mutate(weightMeanPerDraw = weightMeanPerDraw - 1) -> epredOrdDraws_mvr

# summary stats
epredOrdDraws_mvr %>%
  group_by(ats_baseline, yearsFromStart) %>%
    reframe(iqr = quantile(x = weightMeanPerDraw,
                           probs = c(0.025, 0.5, 0.975)),
            quantiles = names(iqr)) %>%
      pivot_wider(names_from = quantiles,
                  values_from = iqr) %>%
        rename(median = "50%",
               lowCI = "2.5%",
               hiCI = "97.5%") %>%
          relocate(median,
                   .after = yearsFromStart) %>%
            mutate(ats_baseline = factor(ats_baseline)) -> summaryStats_mvr

# graph
summaryStats_mvr %>%
  ggplot(mapping = aes(x = yearsFromStart,
                       y = median, 
                       colour = ats_baseline,
                       fill = ats_baseline)) +
        geom_line() +
        geom_ribbon(mapping = aes(ymin = lowCI,
                                  ymax = hiCI),
                    alpha = 0.1,
                    colour = NA) + 
        coord_cartesian(y = c(0,10)) +
        labs(x = "Years from start of treatment",
             y = outLabel,
             title = paste0(outLabel, " multivariate")) +
        theme_minimal() +
        scale_colour_manual(labels = c("0 days",
                                       "7 days",
                                       "14 days",
                                       "21 days",
                                       "28 days"),
                            values = c("#440154FF",
                                       "#3B528BFF",
                                       "#21908CFF",
                                       "#5DC863FF",
                                       "#F4E112FF"),
                            name = "Baseline ats use: days/28") +
        scale_fill_manual(labels = c("0 days",
                                     "7 days",
                                     "14 days",
                                     "21 days",
                                     "28 days"),
                          values = c("#440154FF",
                                     "#3B528BFF",
                                     "#21908CFF",
                                     "#5DC863FF",
                                     "#F4E112FF"),
                          name = "Baseline ats use: days/28") +
        scale_y_continuous(breaks = seq(0,10,2),
                           expand = expansion(mult = 0)) +
        scale_x_continuous(breaks = seq(0,1,0.25),
                           expand = expansion(mult = 0.03)) +
        theme(title = element_text(size = 12, 
                                   family = "sans",
                                   face = "bold"), 
              legend.text = element_text(size = 12, 
                                         family = "sans"),
              legend.title = element_text(size = 12, 
                                          family = "sans",
                                          face = "bold"),
              axis.text = element_text(size = 12, 
                                       family = "sans"),
              axis.title = element_text(size = 12, 
                                        family = "sans",
                                        face = "bold"),
              axis.line = element_line(colour = "gray20"),
              panel.grid.minor = element_blank()) -> ppl_final_mvr

return(ppl_final_mvr)
} # end function

# now create graphs for each of the three outcomes extracted from the multivariate model
predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
                      outVar = "psych",
                      outLabel = "psychological health") -> psychG_mvr

predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
                      outVar = "phys",
                      outLabel = "physical health") -> physG_mvr

predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
                      outVar = "qol",
                      outLabel = "quality of life") -> qolG_mvr

Now let’s put the graphs in panels using ggpubr to illustrate my issue


ggarrange(psychG, psychG_mvr,
          ncol = 2,
          common.legend = TRUE) -> panel_diag_psych

panel_diag_psych

ggarrange(physG, physG_mvr,
          ncol = 2,
          common.legend = TRUE) -> panel_diag_phys

panel_diag_phys

ggarrange(qolG, qolG_mvr,
          ncol = 2,
          common.legend = TRUE) -> panel_diag_qol

panel_diag_qol

You can see that the predictions from the univariate models and the multivariate models are quite different. My understanding was that these should be identical given the model allows no correlations between outcome residuals. So that’s one issue

The other is that the predictions are basically identical across all three outcomes, which can be seen when we place the graphs from the predictions from each of the outcomes from the multivariate model side by side.

ggarrange(psychG_mvr, physG_mvr, qolG_mvr, 
          ncol = 3,
          common.legend = TRUE) -> panel_diag_mvr

panel_diag_mvr

These all look very suspicious to me, like I’ve made a mistake in the coding (perhaps in the response=foo argument somehow?). If anyone has any insight I would very much appreciate.

I think the problem is here, you have response instead of resp in your second plot function, which is silently ignored.

FWIW, I think the expected behaviour should be similar to one of the following:

pp_check(fit_ordinal_mvr_ppq_ao_nm, response = "phys")

Error: Argument ‘resp’ must be a single variable name when applying this method to a multivariate model.

pp_check(fit_ordinal_mvr_ppq_ao_nm, resp = "phys", gizmo = 2)

Using 10 posterior draws for ppc type ‘dens_overlay’ by default.
Warning message:
The following arguments were unrecognized and ignored: gizmo

In the first case pp_check requires a resp argument for a multivariate model, so throws an error. In the second it gives an informative warning. I am not sure which of the interacting functions working under add_epred_draws()'s hood is not flagging that.

1 Like