Understanding output fiited categorical brms model with logit link

nschiett · November 10, 2019, 1:37pm

Hi!
I am trying to reproduce the output of fitted(brmsfit, summary = FALSE) of a categorical model.
I am working on a project with categorical models and need to be able to predict probabilities ‘manually’ for reasons that go beyond the cope of this question.
To my understanding, if I extract the posterior samples of all parameters, I should be able to reproduce the fitted output after doing an inverse logit transformation. When I tried to do this, I got very similar results, although not exactly the same. Repeating the same procedure for new data, that contains bigger values results into bigger differences between the fitted output and ‘manually calculated’ predictions.
I think this has everything to do with the reverse logit function I am using (expit):

expit <- function(x){
  return(exp(x)/(1+exp(x)))
}

For example expit(50) = expit(100) = 1.
Any thoughts on how I can fix this issue and better approximate what the brms fitted function is doing? Thanks a lot!

Below is a simple reproducible example.

library(brms)
library(dplyr)

#reverse logit function
expit <- function(x){
  return(exp(x)/(1+exp(x)))
}


n <- 100

set.seed(10)
# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- (1:-1)
career <- rep(NA, n)  # empty vector of choices for each individual

for (i in 1:n) {
  score     <- 0.5 * (1:3) + b * family_income[i]
  p         <- score/sum(score)
  career[i] <- sample(1:3, size = 1, prob = p)
}

# fit model
fit <- 
  brm(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)),
    iter = 2500, warmup = 500, cores = 2, chains = 2,
    seed = 10)

test <- fitted(fit, summary = FALSE)

summary(fit)

# subset with only first iteration
test1 <- test[1,,]

param <- posterior_samples(fit)

# fit for first iteration
mu <- tibble(
mu1 = expit(rep(0, 100)),
mu2 = expit(param[1, 1] + param[1,3] * family_income),
mu3 = expit(param[1, 2] + param[1,4] * family_income))

mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>% 
  mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)
       
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])


#### extrapolation to new data with bigger numbers
newdata <- list( family_income = family_income * 1000)

new <- fitted(fit, summary = FALSE, newdata = newdata)

# subset with only first iteration
test1 <- new[1,,]

# fit for first iteration
mu <- tibble(
  mu1 = expit(rep(0, 100)),
  mu2 = expit(param[1, 1] + param[1,3] * family_income * 1000),
  mu3 = expit(param[1, 2] + param[1,4] * family_income * 1000))

mu <- mutate(mu, musum = mu1 + mu2 + mu3) %>% 
  mutate(p1 = mu1/musum, p2 = mu2/musum, p3 = mu3/musum)

hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])

# --> bigger differences between fitted and manual prediction

bbbales2 · November 12, 2019, 2:44pm

The link function for categorical_logit is a softmax. I think you’d get the right answer if you replaced those expits with exps or just take an implementation from the interwebs: https://gist.github.com/aufrank/83572. That second thing avoids overflow problems better.

Another thing, if you’re needing to get to the guts and gears of a brms model you can generate the Stan code and data and use it externally (or investigate what’s going on):

For the code:

make_stancode(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)))

And make_standata for the data. Lemme know if the softmax works

nschiett · November 12, 2019, 3:38pm

Thank you so much for your answer.
softmax works indeed. When I use the function to find the fit of the actual data, I get values that are almost exactly the same as the ouput from fitted() in brms. However, when extrapolating to bigger data, in some cases the difference is still quite big. Any ideas why this could be the case? Thanks!

The new code to reproduce an example:

library(brms)
library(dplyr)

## softmax function
logsumexp <- function (x) {
  y = max(x)
  y + log(sum(exp(x - y)))
}

softmax <- function (x) {
  exp(x - logsumexp(x))
}

n <- 100

set.seed(10)
# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- (1:-1)
career <- rep(NA, n)  # empty vector of choices for each individual

for (i in 1:n) {
  score     <- 0.5 * (1:3) + b * family_income[i]
  p         <- score/sum(score)
  career[i] <- sample(1:3, size = 1, prob = p)
}

# fit model
fit <- 
  brm(data = list(career        = career,  
                family_income = family_income), 
    family = categorical(link = logit),
    career ~ 1 + family_income,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(normal(0, 5), class = b)),
    iter = 2500, warmup = 500, cores = 2, chains = 2,
    seed = 10)

test <- fitted(fit, summary = FALSE)

summary(fit)

# subset with only first iteration
test1 <- test[1,,]

param <- posterior_samples(fit)

# fit for first iteration
mu <- tibble(
mu1 = (rep(0, 100)),
mu2 = (param[1, 1] + param[1,3] * family_income),
mu3 = (param[1, 2] + param[1,4] * family_income))

dm <- apply(mu, 1, softmax)

mu <- mu %>%
  mutate(p1 = dm[1,], p2 = dm[2,], p3 = dm[3,])
       
hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])


#### extrapolation to new data with bigger numbers
newdata <- list( family_income = family_income * 1000)

new <- fitted(fit, summary = FALSE, newdata = newdata)

# subset with only first iteration
test1 <- new[1,,]

# fit for first iteration
mu <- tibble(
  mu1 = (rep(0, 100)),
  mu2 = (param[1, 1] + param[1,3] * family_income * 1000),
  mu3 = (param[1, 2] + param[1,4] * family_income * 1000))

dm <- apply(mu, 1, softmax)

mu <- mu %>%
  mutate(p1 = dm[1,], p2 = dm[2,], p3 = dm[3,])

hist(mu$p1 - test1[,1])
hist(mu$p2 - test1[,2])
hist(mu$p3 - test1[,3])

# --> bigger differences between fitted and manual prediction

bbbales2 · November 12, 2019, 3:55pm

mu3 = (param[1, 2] + param[1,4] * family_income) * 1000

Is that 1000 on the wrong side of the parentheses?

nschiett · November 12, 2019, 4:55pm

Absolutely! Rookie mistake!

Ok, then the softmax option totally works. Thanks again for your help. I edited my last example.

Topic		Replies	Views
Interpreting results from categorical() with brm brms interpret-results	6	4514	April 2, 2024
Understanding brms functions for summarizing a logistic model brms	1	964	November 16, 2021
Interpret parameters from multinomial (categorical) logit regression Modeling rstan , brms	1	872	February 16, 2022
Extracting Model Coefficients in brms - Output Unclear brms	4	2325	September 10, 2021
Conditional_effects() producing different Expected Values than fitted() brms conditional-effects	4	364	December 13, 2023

Understanding output fiited categorical brms model with logit link

Related Topics