# Predicted Probabilities for Multinomial Regression

I’m working with survey data and I’m trying to determine the probability of a respondent providing a particular response to a question based on demographic predictors, but I haven’t been able to figure out how to extract outcome probabilities. I have a fake dataset below to give an idea of what I’m trying to do. I have two demographic predictors and a three-level outcome variable.

``````library(dplyr)
library(brms)

key <- expand.grid(sex = c('Male', 'Female'),
age = c('18-34', '35-64', '45-64'))
n <- 1000
set.seed(123)
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
stringsAsFactors = FALSE)

age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]

mm <- list(age, sex)

#Coefficients
beta_cat1 <- list(age = matrix(c(2, 1)), sex = matrix(-1))
beta_cat2 <- list(age = matrix(c(-3, 2)), sex = matrix(2))

xb1 <- Reduce(rowSums, lapply(seq_along(beta_cat1), function(i) mm[[i]] %*% beta_cat1[[i]]))
xb2 <- Reduce(rowSums, lapply(seq_along(beta_cat2), function(i) mm[[i]] %*% beta_cat2[[i]]))

eta1 <- xb1 + rnorm(length(xb1))
eta2 <- xb2 + rnorm(length(xb2))

sum_exp <- rowSums(cbind(exp(eta1), exp(eta2)))

logit <- function(x){
exp(x)/(1 + sum_exp)
}

#Class probabilities
p3 <- logit(0)
p2 <- logit(eta2)
p1 <- logit(eta1)

df <- cbind(df, p1 = p1, p2 = p2, p3 = p3)
#Generate outcome variables
ys <- lapply(seq_len(nrow(df)), function(i) t(rmultinom(1, 1, prob = c(p1[i], p2[i], p3[i]))))
ys <- do.call(rbind, ys)
y_names <- paste0('y', 1:3)
colnames(ys) <- y_names

df <- cbind(df, ys)
``````

I aggregated the outcome variable to counts because I thought that might improve computation time.

``````df_agg <- df %>% group_by(age, sex) %>% summarise_at(.vars = y_names, .funs = sum)
df_agg\$N <- rowSums(df_agg[, y_names])
df_agg\$response <- with(df_agg, cbind(y1, y2, y3))

brms_fit <- brm(response | trials(N) ~ (1|age) + (1|sex), data = df_agg, family = multinomial())
``````

Unsurprisingly, when I run `predict` or `fitted` with `summary = FALSE`, the result is the number of successes in each class. My question is, is there a way to get `predict` or `fitted` to return an array that has the predicted probability of the outcome being y1, y2, or y3 for each row of `df_agg` for each draw?

• Operating System: Ubuntu 16.04
• brms Version: 2.10.0

Hi! Welcome to the Stan discourse!

I’m no `brms` expert, but did you try `fitted` with `scale = "linear"`?

That should give you the result on the level of the linear predictor – so maybe you have to then transform it (when it’s on the logit scale).

Adding to what Max said, `fitted` should be able to directly give you the probabilities with the default value of `scale` if you use `newdata` and set `N = 1` in the new data.

Thanks. This looks ok for `fitted` but does it work for `predict`? When I run `fitted` this looks reasonable:

``````newdat <- df_agg
newdat\$N <- 1

fitted(brms_fit, newdata = newdat)[, 1, ]

P(Y = y1)   P(Y = y2) P(Y = y3)
[1,] 0.3154898 0.368647154 0.3158631
[2,] 0.2972875 0.388514085 0.3141984
[3,] 0.8588791 0.003446441 0.1376745
[4,] 0.8520503 0.003829041 0.1441207
[5,] 0.2894886 0.590333856 0.1201776
[6,] 0.2688771 0.613391643 0.1177313
``````

But predict estimates look very different:

``````predict(brms_fit, newdata = newdat, scale = 'linear')[, 1, ]

y1      y2      y3
[1,] 0.15450 0.35125 0.49425
[2,] 0.15600 0.34250 0.50150
[3,] 0.31475 0.31625 0.36900
[4,] 0.31275 0.31075 0.37650
[5,] 0.12675 0.41225 0.46100
[6,] 0.13050 0.40650 0.46300
``````