Predicted Probabilities for Multinomial Regression

Matt_Kosko · October 21, 2019, 3:09pm

I’m working with survey data and I’m trying to determine the probability of a respondent providing a particular response to a question based on demographic predictors, but I haven’t been able to figure out how to extract outcome probabilities. I have a fake dataset below to give an idea of what I’m trying to do. I have two demographic predictors and a three-level outcome variable.

library(dplyr)
library(brms)

key <- expand.grid(sex = c('Male', 'Female'),
                   age = c('18-34', '35-64', '45-64'))
n <- 1000
set.seed(123)
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
                 age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
                 stringsAsFactors = FALSE)

age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]

mm <- list(age, sex)

#Coefficients
beta_cat1 <- list(age = matrix(c(2, 1)), sex = matrix(-1))
beta_cat2 <- list(age = matrix(c(-3, 2)), sex = matrix(2))

xb1 <- Reduce(rowSums, lapply(seq_along(beta_cat1), function(i) mm[[i]] %*% beta_cat1[[i]]))
xb2 <- Reduce(rowSums, lapply(seq_along(beta_cat2), function(i) mm[[i]] %*% beta_cat2[[i]]))

eta1 <- xb1 + rnorm(length(xb1))
eta2 <- xb2 + rnorm(length(xb2))

sum_exp <- rowSums(cbind(exp(eta1), exp(eta2)))

logit <- function(x){
  exp(x)/(1 + sum_exp)
}

#Class probabilities
p3 <- logit(0)
p2 <- logit(eta2)
p1 <- logit(eta1)

df <- cbind(df, p1 = p1, p2 = p2, p3 = p3)
#Generate outcome variables
ys <- lapply(seq_len(nrow(df)), function(i) t(rmultinom(1, 1, prob = c(p1[i], p2[i], p3[i]))))
ys <- do.call(rbind, ys)
y_names <- paste0('y', 1:3)
colnames(ys) <- y_names

df <- cbind(df, ys)

I aggregated the outcome variable to counts because I thought that might improve computation time.

df_agg <- df %>% group_by(age, sex) %>% summarise_at(.vars = y_names, .funs = sum)
df_agg$N <- rowSums(df_agg[, y_names])
df_agg$response <- with(df_agg, cbind(y1, y2, y3))

brms_fit <- brm(response | trials(N) ~ (1|age) + (1|sex), data = df_agg, family = multinomial())

Unsurprisingly, when I run predict or fitted with summary = FALSE, the result is the number of successes in each class. My question is, is there a way to get predict or fitted to return an array that has the predicted probability of the outcome being y1, y2, or y3 for each row of df_agg for each draw?

Operating System: Ubuntu 16.04
brms Version: 2.10.0

Max_Mantei · October 21, 2019, 4:12pm

Hi! Welcome to the Stan discourse!

I’m no brms expert, but did you try fitted with scale = "linear"?

That should give you the result on the level of the linear predictor – so maybe you have to then transform it (when it’s on the logit scale).

paul.buerkner · October 21, 2019, 4:41pm

Adding to what Max said, fitted should be able to directly give you the probabilities with the default value of scale if you use newdata and set N = 1 in the new data.

Matt_Kosko · October 21, 2019, 10:14pm

Thanks. This looks ok for fitted but does it work for predict? When I run fitted this looks reasonable:

newdat <- df_agg
newdat$N <- 1

fitted(brms_fit, newdata = newdat)[, 1, ]

     P(Y = y1)   P(Y = y2) P(Y = y3)
[1,] 0.3154898 0.368647154 0.3158631
[2,] 0.2972875 0.388514085 0.3141984
[3,] 0.8588791 0.003446441 0.1376745
[4,] 0.8520503 0.003829041 0.1441207
[5,] 0.2894886 0.590333856 0.1201776
[6,] 0.2688771 0.613391643 0.1177313

But predict estimates look very different:

predict(brms_fit, newdata = newdat, scale = 'linear')[, 1, ]

          y1      y2      y3
[1,] 0.15450 0.35125 0.49425
[2,] 0.15600 0.34250 0.50150
[3,] 0.31475 0.31625 0.36900
[4,] 0.31275 0.31075 0.37650
[5,] 0.12675 0.41225 0.46100
[6,] 0.13050 0.40650 0.46300

Topic		Replies	Views
Help Correctly Specifying Multinomial Regression with predictor error in BRMS brms specification , brms	2	135	May 7, 2024
Modeling many zeros and low values in a multinomial outcome brms techniques , specification , brms	1	741	December 8, 2022
Extract posteriors and compute odds ratio in multinomial regression brms	3	940	October 30, 2020
Multinomial model with spline and interaction - getting output from continous and categorical variable together brms multinomial-response , splines , brms	1	654	July 27, 2022
Multinomial model with random effects to study social interactions brms	6	3288	March 2, 2021

Predicted Probabilities for Multinomial Regression

Related topics