Cumulative ordinal model in brms without any observations of one category

I developed a cumulative ordinal model with brms that has been working well (thanks brms and stan!). But a new, small dataset contains observations of all but one possible outcome in the dependent variable (e.g., no observations of outcome “d”, but observations of “a”, “b”, “c”, “e”, “f”, and “g”).

I would expect 6 thresholds. I would expect thresholds 3 and 4 (between “c” and “d”, and between “d” and “e”) to be close together, corresponding to a low posterior probability of outcome “d”. If I write the formula y ~ x, there are one too few thresholds. If I write the formula y | thres(6), there are 6 thresholds, but thresholds 3 and 4 are not particularly close, instead threshold 6 gets pushed to a high value. A plot of conditional_effects does not show the names of the factor levels, instead just integers. Instead of category “d” having a low probability, the category “6” has a low probability. The output of posterior_epred() does not show the names of the factor levels, either. Below is a simple example.

library(brms)
library(dplyr)
library(tidyr)

lvls <- c("a", "b", "c", "d", "e", "f", "g")
x <- c("U", "V")

# All outcome categories observed
y1 <- factor(lvls, levels = lvls, ordered = TRUE)
data1 <- expand_grid(y = y1, x)

# All outcome categories observed; y ~ x
fit1a <- brm(
  y ~ x,
  data1, 
  family = cumulative(link = "logit")
)
summary(fit1a)
conditional_effects(fit1a, "x", categorical = TRUE)
posterior_epred(fit1a, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome as expected

# All outcome categories observed; y | thres(6) ~ x
fit1b <- brm(
  y | thres(6) ~ x,
  data1, 
  family = cumulative(link = "logit")
)
summary(fit1b)
conditional_effects(fit1b, "x", categorical = TRUE)
posterior_epred(fit1b, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome as expected; same as fit1a


# One outcome category unobserved
y2 <- factor(lvls[-4], levels = lvls, ordered = TRUE)
data2 <- expand_grid(y = y2, x)

# One outcome category unobserved; y ~ x
fit2a <- brm(
  y ~ x,
  data2, 
  family = cumulative(link = "logit")
)
summary(fit2a)
conditional_effects(fit2a, "x", categorical = TRUE)
posterior_epred(fit2a, newdata = data.frame(x = c("U")))[1, 1, ]
# One too few thresholds. Outcome "d" is missing.

# One outcome category unobserved; y | thres(6) ~ x
fit2b <- brm(
  y | thres(6) ~ x,
  data2, 
  family = cumulative(link = "logit")
)
summary(fit2b)
conditional_effects(fit2b, "x", categorical = TRUE)
posterior_epred(fit2b, newdata = data.frame(x = c("U")))[1, 1, ]
# Outcome categories are now integers, not letters. A low posterior probability 
# for "7" as opposed to "d".

fit1b
fit1b

fit2b
fit2b

> packageVersion("StanHeaders")
[1] ‘2.26.28’
> packageVersion("rstan")
[1] ‘2.32.3’
> packageVersion("brms")
[1] ‘2.20.4’
> 
> Sys.info()[c("sysname", "release", "version")]
      sysname       release       version 
    "Windows"  "Server x64" "build 17763" 
3 Likes

The way I handle this in the rms package lrm and orm functions and the rmsb package blrm function is parameters don’t exist for levels not observed. When getting post-fitting estimates or bootstrapping when some of the models have fewer parameters than others I take special care post-fitting, but don’t cover all situations.

A brute-force way to do this would be to include at least one observation from each of the unobserved categories with zero weight. That way the model recognizes the original ordering of the categories but the unobserved cases don’t influence the likelihood.

# One outcome category weighted 0
data3 <- expand_grid(y = y1, x)
data3$w <- ifelse(data3$y == 'd', 0, 1) # If `d`, set weight to 0. Otherwise set to 1

# One outcome category unobserved; y|weights(w) ~ x
fit3 <- brm(
  y|weights(w) ~ x,
  data3, 
  family = cumulative(link = "logit")
)
summary(fit3)
conditional_effects(fit3, "x", categorical = TRUE)
posterior_epred(fit3, newdata = data.frame(x = c("U")))[1, 1, ]

summary(fit3)$fixed %>%
  as.data.frame() %>%
  rownames_to_column('predictor') %>%
  filter(predictor != 'xV')  %>%
  mutate(threshold = 1:n()) %>%
  ggplot(aes(threshold, Estimate)) +
    geom_line() +
    geom_point(size = 2, shape = 21, fill = 'white') +
    scale_x_continuous('Threshold', breaks = 1:6, labels = str_c(letters[1:6], '|', letters[2:7]))


3 Likes

fascinating

Thanks for the replies!

@simonbrauer, you clearly identify a work around that addresses my problem. Thanks!

In reality I am using brm_multiple() so it will take a little more work to add the zero-weight “observations” into my analysis, but I can figure that out.

Fascinating, as Solomon right notes. Also seems like an excellent hack @simonbrauer.

I have two questions to follow up. The first question is specific and the second more general.

  1. Would this approach work for data with observations of categories nested across different settings where there were different settings without any observations of one category? For example, in the following model:

fit3 ← brm(
y|weights(w) ~ x + (1|Setting),
data3,
family = cumulative(link = “logit”)
)

  1. Why not just apply weights to represent the counts of all of the data?

The main reason why I have avoided what I suggest in question 2 is because the weights only affect the likelihood contribution and not the posterior, so my experience is that the pp_check function becomes a bit useless (though, granted, that might be much more a case of me being unable to update the pp_check function to make it usable with weighted data).

I’m mostly interested in question 2 in the context of large datasets (currently working on one with >1.5 million observations).

Regards
Ross

I see. So is the idea that you have some subset of imputed datasets that just so happen to be missing particular levels? In that case, I think the easiest way would be to create a separate dataset with one observation per category and zero weight. Then you can append that some dataset to each of the imputed datasets to ensure each are represented at least once. That would also mean that you’re only conducting one extra likelihood calculation per category, so the cost should be minimal.

I should say that I came up with this hack on-the-fly after reading your question (“This should work… does it?”) and not based on any deeper understanding of how brms works. So while it should get you the correct posterior, it may have other consequences to the calculations brms does if, for example, they are based on the unweighted number of observations. I doubt that’s the case, but just as a warning.

As @Ross_Neville points out, you might get some extra draws from posterior predictive checks that you would also need to weight. But that should be pretty easy to do; just add the weight column to each draw from the posterior and either filter out the zero-weight observations or weight the density plots.

1 Like

Sorry, my brain isn’t putting the pieces together. Could you give an example/dataset of what you’re thinking?

So long as your weights can truly by interpreted as frequency weights (e.g. a weight of 5 means that there are 5 perfectly identical observations in the data, which would otherwise have the exact same likelihood) then there is no issue using weights for all of the data. And, as you state, it could be much more efficient or make an intractable problem tractable.

I think we mean the same thing, but just to clarify: the weights do affect the posteriors of the model parameters. But for the purposes of posterior predictive checks, brms treats each observation as a single observation of weight one. So if you collapse a dataset of 1 millions observations down to a dataset of 5 weighted observations, pp_check() returns posterior draws with 5 observations each. One (possibly inefficient) option would be to generate as many sets of posterior draws as the maximum weight, then select only as many observations as each are weighted. This would be efficient if the weights are all similar but would become increasingly inefficient as the weights vary.

Thanks @simonbrauer for your responses and suggestions that I can take forward. Thanks for also clarifying some of what I said later in the post where I should have been more precise (but didn’t have the brms chops to do so!!).
In terms of example dataset, I don’t have a minimally reproducible example to hand (I’m in transit). However, an individual participant data meta-analysis of ordinal data would be a good and realistic example. There, in principle, you could have multiple study settings with small sample sizes where one level of the ordinal variable is not observed in the sample. (Let’s just assume that the same level of the ordinal variable is not observed across some study settings, though that might not be the case).

For my case, the dependent variable is always observed. Some observations of the independent variables are missing. I am using the mice package for multiple imputations. The workaround using weights() can be generalized for brm_multiple() and mice(), too.
Here is an example.

library(brms)
library(dplyr)
library(mice)

y_levels <- letters[1:7]
x1_levels <- c("U", "V")

all_outcome_levels_pseudo_data <- tibble(
  weight = 0,
  y = factor(y_levels, levels = y_levels, ordered = TRUE),
  x1 = factor(x1_levels[1], levels = x1_levels),
  x2 = 0,
  x3 = 0
)
  
n <- 30
set.seed(855)
data <- tibble(
  weight = 1,
  y = factor(
    sample(y_levels[-4], n, replace = TRUE), 
    levels = y_levels, 
    ordered = TRUE
  ),
  x1 = factor(
    sample(
      c(rep(NA_character_, 3), sample(x1_levels, n - 3, replace = TRUE))
    ),
    levels = x1_levels
  ),
  x2 = sample(c(rep(NA_real_, 2), rnorm(n - 2))),
  x3 = sample(c(rep(NA_real_, 4), rnorm(n - 4)))
)
print(data[complete.cases(data), ])
print(data[!complete.cases(data), ])

pred <- quickpred(data, exclude = "y")

data <- bind_rows(all_outcome_levels_pseudo_data, data)

imp <- mice(
  data,
  method = "cart",
  pred = pred,
  ignore = data$weight == 0,
  seed = 903
)
print(head(imp$loggedEvents, 10))

family <- cumulative("probit")
fit <- brm_multiple(
  y | weights(weight) ~ x1 + x2 + x3,
  imp,
  family,
  seed = 1026
)
summary(fit)
conditional_effects(fit, "x1", categorical = TRUE)
posterior_epred(fit, newdata = tibble(x1 = x1_levels[1], x2 = 0, x3 = 0))[1, 1, ]