I am conducting a multivariate ordinal regression examining the effect of number of days of amphetamine-type-stimulant use in the 28 days prior to starting treatment (variable ats_baseline ) on three outcomes: psychological health, physical health, and quality of life, all measured on a 0-10 integer scale (variables psych, phys, qol ) during the first years of treatment (variable yearsFromStart , a continuous variable between 0 and 1). Data are grouped by encounter (variable encID ) which defines a block of treatment. For our purposes this can be considered equivalent to a participant ID number. The minimum number of measurements per encID is two. All encounters have the three outcomes measured at start of treatment (i.e. yearsFromStart = 0) and at least one other time. When these non-start-of-treatment-measurements occur is irregular. Here are the data.
In response to this post @amynang gave me a helpful tip how to extract the predictions for each of the outcomes in isolation, via the response = foo argument in the add_epred_draws function from the tidybayes package.
That seemed to work, but when I checked the predicted values against the corresponding univariate analysis (i.e. with only one outcome) the results look quite different. In addition all three sets of predictions across all three variables look, well, identical. I am wondering if I have done something wrong somewhere in my workflow, which I reproduce below. There’s a lot of code here but it should run with a copy paste. Warning these models take a while to run, but once they run the rest of the code runs quickly.
Note: Because this is an ordinal analysis we have to change the scale of the outcomes variables from 0-10 to 1-11, and then for the predictions we subtract that 1 to get the predictions on the original scale.
Here’s the dataset described above,
atsUseInOTP_alloutcomes_noMissing_inf.RData (17.3 KB)
# load environment
library(tidyverse)
library(magrittr)
library(lubridate)
library(haven)
library(modelr)
library(loo)
library(brms)
library(rstan)
library(posterior)
options(posterior.num_args=list(digits=2))
library(pillar)
library(ggplot2)
library(bayesplot)
library(tidybayes)
library(ggdist)
set1 <- RColorBrewer::brewer.pal(3, "Set1")
# load data
load(file = "data/atsUseInOTP_alloutcomes_noMissing_inf.RData") # dInf
names(dInf)
# check ranges of outcomes
dInf %>%
ungroup %>%
reframe(across(.cols = psych:qol,
.fns = ~range(.x, na.rm = T)))
# for ordinal analysis we need to transform outcome vars to positive integers
dInf %<>%
mutate(across(.cols = psych:qol,
.fns = ~as.integer(.x+1)))
# # check ranges of outcomes
dInf %>%
ungroup %>%
reframe(across(.cols = psych:qol,
.fns = ~range(.x, na.rm = T)))
#### run univariate models
# psychological health - univariate
fit_ordinal_psych <- brm(formula = psych ~ ats_baseline*yearsFromStart + (1 | encID),
family = cumulative("probit"),
data = dInf,
refresh = 0,
chains = 4,
warmup = 1e3,
iter = 3e3,
refresh = 0)
# physical health - univariate
fit_ordinal_phys <- brm(formula = phys ~ ats_baseline*yearsFromStart + (1 | encID),
family = cumulative("probit"),
data = dInf,
refresh = 0,
chains = 4,
warmup = 1e3,
iter = 3e3,
refresh = 0)
# quality of life - univariate
fit_ordinal_qol <- brm(formula = qol ~ ats_baseline*yearsFromStart + (1 | encID),
family = cumulative("probit"),
data = dInf,
refresh = 0,
chains = 4,
warmup = 1e3,
iter = 3e3,
refresh = 0)
# all three outcomes - -multivariate
bform_ord_mvr_ppq <- bf(mvbind(psych, phys, qol) ~ ats_baseline*yearsFromStart + (1|p|encID))
fit_ordinal_mvr_ppq_ao_nm <- brm(formula = bform_ord_mvr_ppq,
family = cumulative("probit"),
data = dInf,
seed = 1234,
refresh = 0,
chains = 4,
warmup = 1e3,
iter = 3e3)
Now we graph predicted trajectories via draws from the expected value of the posterior distribution (also known as the conditional expectation). note we are subtracting 1 from the predictions to get them back on the original scale
# first define a function for the graphs
predOrdDrawsFunct <- function(mod, outLabel) {
# get predicted trajectories of each of the five candidate levels of ATS use at start of treatment
# at each of forty timepoints in the first year
tibble(encID=688, set=28) %>%
data_grid(yearsFromStart = seq(0,1,length.out=40),
ats_baseline=c(0,7,14,21,28),
encID,
set) %>%
add_epred_draws(object = mod,
allow_new_levels = TRUE) %>%
group_by(yearsFromStart, ats_baseline, encID, .draw) %>%
summarise(weightMeanPerDraw = weighted.mean(x = as.numeric(.category),
w = .epred)) %>%
ungroup %>%
mutate(weightMeanPerDraw = weightMeanPerDraw - 1) -> predOrdDraws
# get median point estimates and CIs for each of the
predOrdDraws %>%
group_by(ats_baseline, yearsFromStart) %>%
reframe(iqr = quantile(x = weightMeanPerDraw,
probs = c(0.025, 0.5, 0.975)),
quantiles = names(iqr)) %>%
pivot_wider(names_from = quantiles,
values_from = iqr) %>%
rename(median = "50%",
lowCI = "2.5%",
hiCI = "97.5%") %>%
relocate(median,
.after = yearsFromStart) %>%
mutate(ats_baseline = factor(ats_baseline)) -> summaryStats
# make graph with confidence envelopes
summaryStats %>%
ggplot(mapping = aes(x = yearsFromStart,
y = median,
colour = ats_baseline,
fill = ats_baseline)) +
geom_line() +
geom_ribbon(mapping = aes(ymin = lowCI,
ymax = hiCI),
alpha = 0.1,
colour = NA) +
coord_cartesian(y = c(0,10)) +
labs(x = "Years from start of treatment",
y = paste0("ATOP ", outLabel),
title = paste0(outLabel, " univariate")) +
theme_minimal() +
scale_colour_manual(labels = c("0 days",
"7 days",
"14 days",
"21 days",
"28 days"),
values = c("#440154FF",
"#3B528BFF",
"#21908CFF",
"#5DC863FF",
"#F4E112FF"),
name = "Baseline ats use: days/28") +
scale_fill_manual(labels = c("0 days",
"7 days",
"14 days",
"21 days",
"28 days"),
values = c("#440154FF",
"#3B528BFF",
"#21908CFF",
"#5DC863FF",
"#F4E112FF"),
name = "Baseline ats use: days/28") +
coord_cartesian(ylim = c(0,10)) +
scale_y_continuous(breaks = seq(0,10,2),
expand = expansion(mult = 0)) +
scale_x_continuous(breaks = seq(0,1,0.25),
expand = expansion(mult = 0.03)) +
theme(title = element_text(size = 12,
family = "sans",
face = "bold"),
legend.text = element_text(size = 12,
family = "sans"),
legend.title = element_text(size = 12,
family = "sans",
face = "bold"),
axis.text = element_text(size = 12,
family = "sans"),
axis.title = element_text(size = 12,
family = "sans",
face = "bold"),
axis.line = element_line(colour = "gray20"),
panel.grid.minor = element_blank()) -> ppl_final
return(ppl_final)
} # end function
# now create graphs
predOrdDraws_psych <- predOrdDrawsFunct(mod = fit_ordinal_psych,
outLabel = "psychological health") -> psychG
predOrdDraws_phys <- predOrdDrawsFunct(mod = fit_ordinal_phys,
outLabel = "physical health") -> physG
predOrdDraws_qol <- predOrdDrawsFunct(mod = fit_ordinal_qol,
outLabel = "quality of life") -> qolG
# Now for the multivariate
# define a function
predOrdDrawsFunct_mvr <- function(mod, outVar, outLabel) {
# posterior predictions. once again subtracting 1 from each prediction to put it back on scale of data
tibble(encID=688, set=28) %>%
data_grid(yearsFromStart = seq(0,1,length.out=40),
ats_baseline=c(0,7,14,21,28),
encID,
set) %>%
add_epred_draws(object = mod,
allow_new_levels = TRUE,
response = outVar) %>% # this isolates the draws from each outcome
group_by(yearsFromStart, ats_baseline, encID, .draw) %>%
summarise(weightMeanPerDraw = weighted.mean(x = as.numeric(.category),
w = .epred)) %>%
ungroup %>%
mutate(weightMeanPerDraw = weightMeanPerDraw - 1) -> epredOrdDraws_mvr
# summary stats
epredOrdDraws_mvr %>%
group_by(ats_baseline, yearsFromStart) %>%
reframe(iqr = quantile(x = weightMeanPerDraw,
probs = c(0.025, 0.5, 0.975)),
quantiles = names(iqr)) %>%
pivot_wider(names_from = quantiles,
values_from = iqr) %>%
rename(median = "50%",
lowCI = "2.5%",
hiCI = "97.5%") %>%
relocate(median,
.after = yearsFromStart) %>%
mutate(ats_baseline = factor(ats_baseline)) -> summaryStats_mvr
# graph
summaryStats_mvr %>%
ggplot(mapping = aes(x = yearsFromStart,
y = median,
colour = ats_baseline,
fill = ats_baseline)) +
geom_line() +
geom_ribbon(mapping = aes(ymin = lowCI,
ymax = hiCI),
alpha = 0.1,
colour = NA) +
coord_cartesian(y = c(0,10)) +
labs(x = "Years from start of treatment",
y = outLabel,
title = paste0(outLabel, " multivariate")) +
theme_minimal() +
scale_colour_manual(labels = c("0 days",
"7 days",
"14 days",
"21 days",
"28 days"),
values = c("#440154FF",
"#3B528BFF",
"#21908CFF",
"#5DC863FF",
"#F4E112FF"),
name = "Baseline ats use: days/28") +
scale_fill_manual(labels = c("0 days",
"7 days",
"14 days",
"21 days",
"28 days"),
values = c("#440154FF",
"#3B528BFF",
"#21908CFF",
"#5DC863FF",
"#F4E112FF"),
name = "Baseline ats use: days/28") +
scale_y_continuous(breaks = seq(0,10,2),
expand = expansion(mult = 0)) +
scale_x_continuous(breaks = seq(0,1,0.25),
expand = expansion(mult = 0.03)) +
theme(title = element_text(size = 12,
family = "sans",
face = "bold"),
legend.text = element_text(size = 12,
family = "sans"),
legend.title = element_text(size = 12,
family = "sans",
face = "bold"),
axis.text = element_text(size = 12,
family = "sans"),
axis.title = element_text(size = 12,
family = "sans",
face = "bold"),
axis.line = element_line(colour = "gray20"),
panel.grid.minor = element_blank()) -> ppl_final_mvr
return(ppl_final_mvr)
} # end function
# now create graphs for each of the three outcomes extracted from the multivariate model
predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
outVar = "psych",
outLabel = "psychological health") -> psychG_mvr
predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
outVar = "phys",
outLabel = "physical health") -> physG_mvr
predOrdDrawsFunct_mvr(mod = fit_ordinal_mvr_ppq_ao_nm,
outVar = "qol",
outLabel = "quality of life") -> qolG_mvr
Now let’s put the graphs in panels using ggpubr to illustrate my issue
ggarrange(psychG, psychG_mvr,
ncol = 2,
common.legend = TRUE) -> panel_diag_psych
panel_diag_psych
ggarrange(physG, physG_mvr,
ncol = 2,
common.legend = TRUE) -> panel_diag_phys
panel_diag_phys
ggarrange(qolG, qolG_mvr,
ncol = 2,
common.legend = TRUE) -> panel_diag_qol
panel_diag_qol
You can see that the predictions from the univariate models and the multivariate models are quite different. My understanding was that these should be identical given the model allows no correlations between outcome residuals. So that’s one issue
The other is that the predictions are basically identical across all three outcomes, which can be seen when we place the graphs from the predictions from each of the outcomes from the multivariate model side by side.
ggarrange(psychG_mvr, physG_mvr, qolG_mvr,
ncol = 3,
common.legend = TRUE) -> panel_diag_mvr
panel_diag_mvr
These all look very suspicious to me, like I’ve made a mistake in the coding (perhaps in the response=foo argument somehow?). If anyone has any insight I would very much appreciate.



