Cumulative probit model: predicting the probability an observation is in a category or higher (i.e. above or below a single threshold)

I have data for which the outcome is ordered categories. I fit a cumulative probit model. Among other things, I am also interested in predicting for each observation the probability that it is in a specific category or higher (i.e. above a threshold). For example, let’s say I have 7 categories, and the observation is “S - Susceptible” if it is in categories 1-4 and is “R - Resistant” if it is in categories 5-7.

In brms, handy `fitted()` function will allow me to obtain the probability that an observation is in each of the 7 categories. However, in order to find the probability that it is in category 5 or higher, my thought was that I could use `fitted(model, scale="linear", summary=F)` to obtain the posterior samples for the mean on the scale of the linear predictor for each observation. Then I could use `pnorm()` with the 4th threshold and mean as the values from `fitted()` to find the probability that the observation is below or above the 4th threshold (in cats 1-4 or 5-7).

Does that make sense? Am I understanding the cumulative probit model correctly?

Example code to do this is below:

``````library(brms)

#data
s <- 1:7
y <- sample(s, size=50, replace=T)
trt <- rbinom(n=50, size=1, prob=0.5)
d1 <- cbind.data.frame(y,trt)
d1\$trt <- factor(d1\$trt)
str(d1)

#cumulative probit model
fit1 <- brm(y ~ trt, data=d1, family=cumulative(probit), cores=4)

#Find the probability for each observation being R or S
#R = resistant, when the obs is in category 5 or higher (above threshold 4)
#S = susceptible, when the obs is in category 4 or lower (below threshold 4)

#predictions on the linear predictor scale, all posterior samples
ffit1L <- fitted(fit1, scale="linear", summary=F)

#extract draws
s1 <- as_draws_df(fit1)

#compute the probabilities that the observation is in S or R. i.e. the probability that it is below or above threshold 4
p_y_S <- pnorm(s1\$`b_Intercept[4]`, mean=ffit1L[,1:50])
p_y_R <- (1 - pnorm(s1\$`b_Intercept[4]`, mean=ffit1L[,1:50]))
p_y_S <- data.frame(p_y_S)
p_y_R <- data.frame(p_y_R)

mean_probs_y_S <- as.numeric(sapply(p_y_S, mean))
mean_probs_y_S

mean_probs_y_R <- as.numeric(sapply(p_y_R, mean))
mean_probs_y_R
``````

I think (but please check!) the `Post.Prob` column returned by `hypothesis(fit1, "trt1 > Intercept[4]")` gives you the answer you require here (the prob of being 5, 6, or 7 when `trt = 1`). And, for when `trt` is at its reference, `hypothesis(fit1, "0 > Intercept[4]")`.

Thanks. I tried it but it doesn’t give a similar answer (and it doesn’t appear to be correct)… Also, if I had a lot of predictors, then I’m not sure how I would use it.

Without regard to the model structure, in problems this easy, you can simply add the probability it is in category 5, in category 6, and in category 7 to get the probability that it’s in one of the categories.

If you want to dive into the probit structure, here’s the doc:

Given that the sum of categories is just a bunch of cdf diffs, you can compute the whole thing with a single cdf call.

1 Like

cool, thanks. Adding all the `fitted()` samples for the probabilities for each category gives me the same answers as my code above. So, seems like there is a variety of ways to go about it.
Thanks for the check!