Hi!
I am trying to reproduce the output of fitted(brmsfit, summary = FALSE) of a categorical model.
I am working on a project with categorical models and need to be able to predict probabilities ‘manually’ for reasons that go beyond the cope of this question.
To my understanding, if I extract the posterior samples of all parameters, I should be able to reproduce the fitted output after doing an inverse logit transformation. When I tried to do this, I got very similar results, although not exactly the same. Repeating the same procedure for new data, that contains bigger values results into bigger differences between the fitted output and ‘manually calculated’ predictions.
I think this has everything to do with the reverse logit function I am using (expit):
expit <- function(x){
return(exp(x)/(1+exp(x)))
}
For example expit(50) = expit(100) = 1.
Any thoughts on how I can fix this issue and better approximate what the brms fitted function is doing? Thanks a lot!
Below is a simple reproducible example.
library(brms)
library(dplyr)
#reverse logit function
expit <- function(x){
return(exp(x)/(1+exp(x)))
}
n <- 100
set.seed(10)
# simulate family incomes for each individual
family_income <- runif(n)
# assign a unique coefficient for each type of event
b <- (1:-1)
career <- rep(NA, n) # empty vector of choices for each individual
for (i in 1:n) {
score <- 0.5 * (1:3) + b * family_income[i]
p <- score/sum(score)
career[i] <- sample(1:3, size = 1, prob = p)
}
# fit model
fit <-
brm(data = list(career = career,
family_income = family_income),
family = categorical(link = logit),
career ~ 1 + family_income,
prior = c(prior(normal(0, 5), class = Intercept),
prior(normal(0, 5), class = b)),
iter = 2500, warmup = 500, cores = 2, chains = 2,
seed = 10)
test <- fitted(fit, summary = FALSE)
summary(fit)
# subset with only first iteration
test1 <- test[1,,]
param <- posterior_samples(fit)
# fit for first iteration
mu <- tibble(
mu1 = expit(rep(0, 100)),
mu2 = expit(param[1, 1] + param[1,3] * family_income),
mu3 = expit(param[1, 2] + param[1,4] * family_income))
mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>%
mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])
#### extrapolation to new data with bigger numbers
newdata <- list( family_income = family_income * 1000)
new <- fitted(fit, summary = FALSE, newdata = newdata)
# subset with only first iteration
test1 <- new[1,,]
# fit for first iteration
mu <- tibble(
mu1 = expit(rep(0, 100)),
mu2 = expit(param[1, 1] + param[1,3] * family_income * 1000),
mu3 = expit(param[1, 2] + param[1,4] * family_income * 1000))
mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>%
mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])
# --> bigger differences between fitted and manual prediction