I developed a cumulative ordinal model with brms that has been working well (thanks brms and stan!). But a new, small dataset contains observations of all but one possible outcome in the dependent variable (e.g., no observations of outcome “d”, but observations of “a”, “b”, “c”, “e”, “f”, and “g”).
I would expect 6 thresholds. I would expect thresholds 3 and 4 (between “c” and “d”, and between “d” and “e”) to be close together, corresponding to a low posterior probability of outcome “d”. If I write the formula y ~ x, there are one too few thresholds. If I write the formula y | thres(6), there are 6 thresholds, but thresholds 3 and 4 are not particularly close, instead threshold 6 gets pushed to a high value. A plot of conditional_effects does not show the names of the factor levels, instead just integers. Instead of category “d” having a low probability, the category “6” has a low probability. The output of posterior_epred() does not show the names of the factor levels, either. Below is a simple example.
library(brms)
library(dplyr)
library(tidyr)
lvls <- c("a", "b", "c", "d", "e", "f", "g")
x <- c("U", "V")
# All outcome categories observed
y1 <- factor(lvls, levels = lvls, ordered = TRUE)
data1 <- expand_grid(y = y1, x)
# All outcome categories observed; y ~ x
fit1a <- brm(
y ~ x,
data1,
family = cumulative(link = "logit")
)
summary(fit1a)
conditional_effects(fit1a, "x", categorical = TRUE)
posterior_epred(fit1a, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome as expected
# All outcome categories observed; y | thres(6) ~ x
fit1b <- brm(
y | thres(6) ~ x,
data1,
family = cumulative(link = "logit")
)
summary(fit1b)
conditional_effects(fit1b, "x", categorical = TRUE)
posterior_epred(fit1b, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome as expected; same as fit1a
# One outcome category unobserved
y2 <- factor(lvls[-4], levels = lvls, ordered = TRUE)
data2 <- expand_grid(y = y2, x)
# One outcome category unobserved; y ~ x
fit2a <- brm(
y ~ x,
data2,
family = cumulative(link = "logit")
)
summary(fit2a)
conditional_effects(fit2a, "x", categorical = TRUE)
posterior_epred(fit2a, newdata = data.frame(x = c("U")))[1, 1, ]
# One too few thresholds. Outcome "d" is missing.
# One outcome category unobserved; y | thres(6) ~ x
fit2b <- brm(
y | thres(6) ~ x,
data2,
family = cumulative(link = "logit")
)
summary(fit2b)
conditional_effects(fit2b, "x", categorical = TRUE)
posterior_epred(fit2b, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome categories are now integers, not letters. A low posterior probability
# for "7" as opposed to "d".
fit1b
fit2b
> packageVersion("StanHeaders")
[1] ‘2.26.28’
> packageVersion("rstan")
[1] ‘2.32.3’
> packageVersion("brms")
[1] ‘2.20.4’
>
> Sys.info()[c("sysname", "release", "version")]
sysname release version
"Windows" "Server x64" "build 17763"