- Operating System: MacOS
- brms Version: 2.14.4
I am trying to reproduce the robit regression example in Gelman et al (2020, pp. 278–279; website), Figure 15.7. It does not appear the authors have made the data or code available. However, I was able to simulate data that looked very close to theirs. I have fit a robit model to the data using brms code based on two prior threads (here and here). The issue is the resulting robit model does not appear to be robust to the unusual case, as advertised in the text. I’d appreciate any help explaining why.
Simulate the data:
library(tidyverse)
library(brms)
n <- 50
set.seed(2)
d <-
tibble(x = runif(n, min = -9, max = 9)) %>%
mutate(y = rbinom(n, size = 1, prob = plogis(0 + 1 * x))) %>%
arrange(x) %>%
mutate(id = 1:n()) %>%
# make the contaminated version of y
mutate(y_c = ifelse(id == 5, 1, y))
glimpse(d)
Rows: 50
Columns: 4
$ x <dbl> -8.812539, -7.650370, -6.928487, -6.675138, -6.617304…
$ y <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
$ y_c <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
Fit a robit model and a conventional logistic model to the contaminated y_c
data with x
as the sole predictor.
stan_inv_robit <- "
real inv_robit(real y, real nu) {
return(student_t_cdf(y, nu, 0, (nu - 2) / nu));
}
"
stanvar_inv_robit <- stanvar(scode = stan_inv_robit, block = "functions")
robit_formula <-
bf(y_c | trials(1) ~ inv_robit(eta, nu),
nlf(eta ~ b0 + b1 * x),
b0 + b1 ~ 1,
nu ~ 1,
nl = TRUE)
# robit
m1 <-
brm(data = d,
family = binomial("identity"),
formula = robit_formula,
prior = c(prior(normal(0, 1), nlpar = b0),
prior(normal(0, 1), nlpar = b1),
prior(gamma(2, 0.1), nlpar = nu, lb = 2)),
stanvars = stanvar_inv_robit,
control = list(adapt_delta = .95),
seed = 15)
# logistic
m2 <-
brm(data = d,
family = binomial,
y_c | trials(1) ~ 1 + x,
prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(0, 1), class = b)),
seed = 15)
Plot the fitted lines of the two models.
expose_functions(m1, vectorize = T)
# extract and wrangle the fitted draws
nd <- data.frame(x = seq(from = -9, to = 9, length.out = 100))
tibble(`robit regression` = fitted(m1, newdata = nd)[, 1],
`logistic regression` = fitted(m2, newdata = nd)[, 1]) %>%
bind_cols(nd) %>%
pivot_longer(-x, values_to = "y_c") %>%
# plot!
ggplot(aes(x = x, y = y_c)) +
geom_line(aes(linetype = name),
size = 1/4) +
geom_point(data = d,
shape = 1, size = 2) +
scale_linetype_manual(NULL, values = c(2, 1)) +
scale_y_continuous("y", breaks = 0:1) +
labs(subtitle = "Contaminated data") +
theme_bw() +
theme(legend.position = c(.75, .3),
legend.box.background = element_rect(color = "black"),
panel.grid = element_blank())
Compare that plot with the one from the text, Figure 15.7a:
Where am I going wrong?