I’ve been using brms to fit a longitudinal hurdle model where the random effects from the two parts of the model are correlated. It seems to work really well. However, I’m struggling to get correct predictions for a new level with the predict method. I’m happy writing my own code for this but wanted to raise my findings here.
I’ll try to explain with 2 examples. Firstly, a gaussian mixed model where each patient (n = 100) has a (correlated) random intercept and slope. Each patient has 4 independent observations at time 0, and 4 independent observations at time 1.
library(brms)
set.seed(180)
u <- rnorm(100, sd = 2)
v <- rnorm(100, mean = -0.95 * u, sd = sqrt((1 - 0.95^2) * 4))
y0 <- rnorm(400, mean = u, sd = 1)
y1 <- rnorm(400, mean = u + v, sd = 1)
id <- 1:100
dat <- data.frame(id = rep(id, 8),
time = rep(0:1, each = 400),
y = c(y0, y1))
ggplot2::ggplot(data = dat,
mapping = aes(x = time, y = y, group = id)) +
geom_line() +
geom_point()
fit <- brm(y ~ 1 + (1 + time | id),
data = dat,
family = gaussian())
When I try to predict the response at time 1 for a new patient (101) using the default method I get a prediction interval that is clearly too wide
predict(fit,
newdata = data.frame(id = 101, time = 1),
allow_new_levels = TRUE)
whereas the sample_new_levels = “gaussian” option looks much better
predict(fit,
newdata = data.frame(id = 101, time = 1),
allow_new_levels = TRUE,
sample_new_levels = "gaussian")
Now, I would like to use the sample_new_levels = “gaussian” for a hurdle model:
set.seed(180)
u <- rnorm(100, sd = 2)
v <- rnorm(100, mean = -0.95 * u, sd = sqrt((1 - 0.95^2) * 4)) # p(zero) is negatively correlated with Y*
ystar <- exp(rnorm(800, mean = u, sd = 1))
z <- rbinom(800, size = 1, prob = 1 - plogis(-1 + v)) # p(cross hurdle) = 1 - p(zero)
id <- 1:100
dat <- data.frame(id = rep(id, 8),
y = z * ystar)
fit_hurdle <- brm(bf(y ~ 1 + (1 | q | id),
hu ~ 1 + (1 | q | id)),
data = dat,
iter = 4000,
family = hurdle_lognormal())
predict(fit_hurdle,
newdata = data.frame(id = 101),
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
robust = TRUE)
But this gives me values that are too small. It’s difficult to see from a plot of the data but I can simulate directly from the model
u <- rnorm(1e6, sd = 2)
v <- rnorm(1e6, mean = -0.95 * u, sd = sqrt((1 - 0.95^2) * 4)) # p(zero) is negatively correlated with Y*
ystar <- exp(rnorm(1e6, mean = u, sd = 1))
z <- rbinom(1e6, size = 1, prob = 1 - plogis(-1 + v)) # p(cross hurdle) = 1 - p(zero)
y <- z * ystar
quantile(y, probs = c(0.025, 0.5, 0.975))
It looks like the predict method is not taking into account correlation in the hurdle model, whereas it was for the gaussian model.
- Operating System: macOS 10.14.6
- brms Version: 2.10.0