Predicted Probabilities for Multinomial Regression

I’m working with survey data and I’m trying to determine the probability of a respondent providing a particular response to a question based on demographic predictors, but I haven’t been able to figure out how to extract outcome probabilities. I have a fake dataset below to give an idea of what I’m trying to do. I have two demographic predictors and a three-level outcome variable.

library(dplyr)
library(brms)

key <- expand.grid(sex = c('Male', 'Female'),
                   age = c('18-34', '35-64', '45-64'))
n <- 1000
set.seed(123)
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
                 age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
                 stringsAsFactors = FALSE)

age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]

mm <- list(age, sex)

#Coefficients
beta_cat1 <- list(age = matrix(c(2, 1)), sex = matrix(-1))
beta_cat2 <- list(age = matrix(c(-3, 2)), sex = matrix(2))

xb1 <- Reduce(rowSums, lapply(seq_along(beta_cat1), function(i) mm[[i]] %*% beta_cat1[[i]]))
xb2 <- Reduce(rowSums, lapply(seq_along(beta_cat2), function(i) mm[[i]] %*% beta_cat2[[i]]))

eta1 <- xb1 + rnorm(length(xb1))
eta2 <- xb2 + rnorm(length(xb2))

sum_exp <- rowSums(cbind(exp(eta1), exp(eta2)))

logit <- function(x){
  exp(x)/(1 + sum_exp)
}

#Class probabilities
p3 <- logit(0)
p2 <- logit(eta2)
p1 <- logit(eta1)

df <- cbind(df, p1 = p1, p2 = p2, p3 = p3)
#Generate outcome variables
ys <- lapply(seq_len(nrow(df)), function(i) t(rmultinom(1, 1, prob = c(p1[i], p2[i], p3[i]))))
ys <- do.call(rbind, ys)
y_names <- paste0('y', 1:3)
colnames(ys) <- y_names

df <- cbind(df, ys)

I aggregated the outcome variable to counts because I thought that might improve computation time.

df_agg <- df %>% group_by(age, sex) %>% summarise_at(.vars = y_names, .funs = sum)
df_agg$N <- rowSums(df_agg[, y_names])
df_agg$response <- with(df_agg, cbind(y1, y2, y3))

brms_fit <- brm(response | trials(N) ~ (1|age) + (1|sex), data = df_agg, family = multinomial())

Unsurprisingly, when I run predict or fitted with summary = FALSE, the result is the number of successes in each class. My question is, is there a way to get predict or fitted to return an array that has the predicted probability of the outcome being y1, y2, or y3 for each row of df_agg for each draw?

  • Operating System: Ubuntu 16.04
  • brms Version: 2.10.0

Hi! Welcome to the Stan discourse!

I’m no brms expert, but did you try fitted with scale = "linear"?

That should give you the result on the level of the linear predictor – so maybe you have to then transform it (when it’s on the logit scale).

Adding to what Max said, fitted should be able to directly give you the probabilities with the default value of scale if you use newdata and set N = 1 in the new data.

Thanks. This looks ok for fitted but does it work for predict? When I run fitted this looks reasonable:

newdat <- df_agg
newdat$N <- 1

fitted(brms_fit, newdata = newdat)[, 1, ]

     P(Y = y1)   P(Y = y2) P(Y = y3)
[1,] 0.3154898 0.368647154 0.3158631
[2,] 0.2972875 0.388514085 0.3141984
[3,] 0.8588791 0.003446441 0.1376745
[4,] 0.8520503 0.003829041 0.1441207
[5,] 0.2894886 0.590333856 0.1201776
[6,] 0.2688771 0.613391643 0.1177313

But predict estimates look very different:

predict(brms_fit, newdata = newdat, scale = 'linear')[, 1, ]

          y1      y2      y3
[1,] 0.15450 0.35125 0.49425
[2,] 0.15600 0.34250 0.50150
[3,] 0.31475 0.31625 0.36900
[4,] 0.31275 0.31075 0.37650
[5,] 0.12675 0.41225 0.46100
[6,] 0.13050 0.40650 0.46300