I have data for which the outcome is ordered categories. I fit a cumulative probit model. Among other things, I am also interested in predicting for each observation the probability that it is in a specific category or higher (i.e. above a threshold). For example, let’s say I have 7 categories, and the observation is “S - Susceptible” if it is in categories 1-4 and is “R - Resistant” if it is in categories 5-7.
In brms, handy fitted()
function will allow me to obtain the probability that an observation is in each of the 7 categories. However, in order to find the probability that it is in category 5 or higher, my thought was that I could use fitted(model, scale="linear", summary=F)
to obtain the posterior samples for the mean on the scale of the linear predictor for each observation. Then I could use pnorm()
with the 4th threshold and mean as the values from fitted()
to find the probability that the observation is below or above the 4th threshold (in cats 1-4 or 5-7).
Does that make sense? Am I understanding the cumulative probit model correctly?
Example code to do this is below:
library(brms)
#data
s <- 1:7
y <- sample(s, size=50, replace=T)
trt <- rbinom(n=50, size=1, prob=0.5)
d1 <- cbind.data.frame(y,trt)
d1$trt <- factor(d1$trt)
str(d1)
#cumulative probit model
fit1 <- brm(y ~ trt, data=d1, family=cumulative(probit), cores=4)
#Find the probability for each observation being R or S
#R = resistant, when the obs is in category 5 or higher (above threshold 4)
#S = susceptible, when the obs is in category 4 or lower (below threshold 4)
#predictions on the linear predictor scale, all posterior samples
ffit1L <- fitted(fit1, scale="linear", summary=F)
#extract draws
s1 <- as_draws_df(fit1)
#compute the probabilities that the observation is in S or R. i.e. the probability that it is below or above threshold 4
p_y_S <- pnorm(s1$`b_Intercept[4]`, mean=ffit1L[,1:50])
p_y_R <- (1 - pnorm(s1$`b_Intercept[4]`, mean=ffit1L[,1:50]))
p_y_S <- data.frame(p_y_S)
p_y_R <- data.frame(p_y_R)
mean_probs_y_S <- as.numeric(sapply(p_y_S, mean))
mean_probs_y_S
mean_probs_y_R <- as.numeric(sapply(p_y_R, mean))
mean_probs_y_R