Understanding output fiited categorical brms model with logit link

Hi!
I am trying to reproduce the output of fitted(brmsfit, summary = FALSE) of a categorical model.
I am working on a project with categorical models and need to be able to predict probabilities ‘manually’ for reasons that go beyond the cope of this question.
To my understanding, if I extract the posterior samples of all parameters, I should be able to reproduce the fitted output after doing an inverse logit transformation. When I tried to do this, I got very similar results, although not exactly the same. Repeating the same procedure for new data, that contains bigger values results into bigger differences between the fitted output and ‘manually calculated’ predictions.
I think this has everything to do with the reverse logit function I am using (expit):

expit <- function(x){
  return(exp(x)/(1+exp(x)))
}

For example expit(50) = expit(100) = 1.
Any thoughts on how I can fix this issue and better approximate what the brms fitted function is doing? Thanks a lot!

Below is a simple reproducible example.

library(brms)
library(dplyr)

#reverse logit function
expit <- function(x){
  return(exp(x)/(1+exp(x)))
}


n <- 100

set.seed(10)
# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- (1:-1)
career <- rep(NA, n)  # empty vector of choices for each individual

for (i in 1:n) {
  score     <- 0.5 * (1:3) + b * family_income[i]
  p         <- score/sum(score)
  career[i] <- sample(1:3, size = 1, prob = p)
}

# fit model
fit <- 
  brm(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)),
    iter = 2500, warmup = 500, cores = 2, chains = 2,
    seed = 10)

test <- fitted(fit, summary = FALSE)

summary(fit)

# subset with only first iteration
test1 <- test[1,,]

param <- posterior_samples(fit)

# fit for first iteration
mu <- tibble(
mu1 = expit(rep(0, 100)),
mu2 = expit(param[1, 1] + param[1,3] * family_income),
mu3 = expit(param[1, 2] + param[1,4] * family_income))

mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>% 
  mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)
       
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])


#### extrapolation to new data with bigger numbers
newdata <- list( family_income = family_income * 1000)

new <- fitted(fit, summary = FALSE, newdata = newdata)

# subset with only first iteration
test1 <- new[1,,]

# fit for first iteration
mu <- tibble(
  mu1 = expit(rep(0, 100)),
  mu2 = expit(param[1, 1] + param[1,3] * family_income * 1000),
  mu3 = expit(param[1, 2] + param[1,4] * family_income * 1000))

mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>% 
  mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)

hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])

# --> bigger differences between fitted and manual prediction

The link function for categorical_logit is a softmax. I think you’d get the right answer if you replaced those expits with exps or just take an implementation from the interwebs: https://gist.github.com/aufrank/83572. That second thing avoids overflow problems better.

Another thing, if you’re needing to get to the guts and gears of a brms model you can generate the Stan code and data and use it externally (or investigate what’s going on):

For the code:

make_stancode(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)))

And make_standata for the data. Lemme know if the softmax works

1 Like

Thank you so much for your answer.
softmax works indeed. When I use the function to find the fit of the actual data, I get values that are almost exactly the same as the ouput from fitted() in brms. However, when extrapolating to bigger data, in some cases the difference is still quite big. Any ideas why this could be the case? Thanks!

The new code to reproduce an example:

library(brms)
library(dplyr)

## softmax function
logsumexp <- function (x) {
  y = max(x)
  y + log(sum(exp(x - y)))
}

softmax <- function (x) {
  exp(x - logsumexp(x))
}

n <- 100

set.seed(10)
# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- (1:-1)
career <- rep(NA, n)  # empty vector of choices for each individual

for (i in 1:n) {
  score     <- 0.5 * (1:3) + b * family_income[i]
  p         <- score/sum(score)
  career[i] <- sample(1:3, size = 1, prob = p)
}

# fit model
fit <- 
  brm(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)),
    iter = 2500, warmup = 500, cores = 2, chains = 2,
    seed = 10)

test <- fitted(fit, summary = FALSE)

summary(fit)

# subset with only first iteration
test1 <- test[1,,]

param <- posterior_samples(fit)

# fit for first iteration
mu <- tibble(
mu1 = (rep(0, 100)),
mu2 = (param[1, 1] + param[1,3] * family_income),
mu3 = (param[1, 2] + param[1,4] * family_income))

dm <- apply(mu, 1, softmax)

mu <- mu %>%
  mutate(p1 = dm[1,], p2 = dm[2,], p3 = dm[3,])
       
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])


#### extrapolation to new data with bigger numbers
newdata <- list( family_income = family_income * 1000)

new <- fitted(fit, summary = FALSE, newdata = newdata)

# subset with only first iteration
test1 <- new[1,,]

# fit for first iteration
mu <- tibble(
  mu1 = (rep(0, 100)),
  mu2 = (param[1, 1] + param[1,3] * family_income * 1000),
  mu3 = (param[1, 2] + param[1,4] * family_income * 1000))

dm <- apply(mu, 1, softmax)

mu <- mu %>%
  mutate(p1 = dm[1,], p2 = dm[2,], p3 = dm[3,])

hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])

# --> bigger differences between fitted and manual prediction
mu3 = (param[1, 2] + param[1,4] * family_income) * 1000

Is that 1000 on the wrong side of the parentheses?

Absolutely! Rookie mistake!

Ok, then the softmax option totally works. Thanks again for your help. I edited my last example.