TLDR; I don’t understand why fitted()
returns values above 1 for family binomial and that lead me to question what exactly fitted()
and posterior_predict()
do in brms.
My logic seems fine for models with a normal and bernoulli likelihood, but not for a binomial likelihood:
For a linear regression–normal likelihood–, I think it’s straight-forward, fitted
should be: draw_intercept + draw_slope * predictor, ignoring sigma, posterior_predict
samples from a normal distribution with location equal to the fitted values, and scale equal to draws of the posterior of sigma.
I do it below in R and it checks out, of course, the predictions have the randomness of sigma so they won’t be the same.
library(brms)
library(dplyr)
options(mc.cores = parallel::detectCores())
N <- 15
set.seed(123)
d1 <- tibble(x = runif(N), y = rnorm(N, x*2, .1))
m1 <- brm(y ~ x, data = d1)
post1 <- as.data.frame(m1)
# Fitted values:
fitted(m1,summary = FALSE) %>% head()
# by "hand"
apply(post1, 1, function(d) d["b_Intercept"] + d["b_x"] * d1$x) %>%
t() %>% head()
# predictions
posterior_predict(m1) %>% head()
# by "hand"
apply(post1, 1,
function(d) rnorm(nrow(d1), d["b_Intercept"] + d["b_x"] * d1$x, d["sigma"])) %>%
t() %>% head()
# They won't be same, but this should be the logic
For a logistic regression with a Bernoulli family, I guess fitted
is: inv_logit(draw_intercept + draw_slope * predictor), posterior_predict
samples from a bernoulli distribution with theta = fitted value.
I do it below in R and seems fine:
set.seed(123)
d2 <- tibble(x = runif(N), y = rbinom(N, 1, plogis(x*.5)))
m2 <- brm(y ~ x, data = d2, family = bernoulli())
post2 <- as.data.frame(m2)
# Fitted values:
fitted(m2, summary = FALSE) %>% head()
# by "hand"
apply(post2, 1, function(d) plogis(d["b_Intercept"] + d["b_x"] * d1$x)) %>%
t() %>% head()
# predictions
posterior_predict(m2) %>% head()
# by "hand"
apply(post2, 1,
function(d) rbinom(nrow(d1),1, plogis(d["b_Intercept"] + d["b_x"] * d1$x))) %>%
t() %>% head()
If the family is binomial, I thought the logic of fitted()
should be identical to the previous case, but it isn’t. I get fitted values above 1 (see fitted(m3, summary = FALSE) %>% head()
below). Of course when I predict, I use a binomial RNG with appropriate size, rather than with a size of 1. So here it is below. Generating predictions, seems fine, but I really don’t get why fitted values can be over 1.
set.seed(123)
d3 <- tibble(x = runif(N), y = rbinom(N, 10, plogis(x*.5)))
m3 <- brm(y | trials(10) ~ x, data = d3, family = binomial())
post3 <- as.data.frame(m3)
# Fitted values:
fitted(m3, summary = FALSE) %>% head()
# by "hand"
apply(post3, 1, function(d) plogis(d["b_Intercept"] + d["b_x"] * d1$x)) %>%
t() %>% head()
# this is off
# predictions
posterior_predict(m3) %>% head()
# by "hand"
apply(post3, 1,
function(d) rbinom(nrow(d1),10, plogis(d["b_Intercept"] + d["b_x"] * d1$x))) %>%
t() %>% head()
# I guess this is fine