How is ELPD loo calculated in detail on ordinal data?

In his lecture series, @avehtari explained how ELPD works on metric data and made me understand it better than ever. That said, I am still not clear on how ELPD is used to evaluate ordinal predictions such as ratings from 1 to 4.

I am trying to find out how Loo is comparing models in detail and wonder if anyone has some resources on the actual math behind them. For instance, if a model predicts a 42% likelihood of a rating of 3, how would this be compared to the predictions of another model?

I would greatly appreciate any resources or insights that could help me delve deeper into the mathematics behind these concepts. Thank you in advance for your assistance.

Best regards.
Simon

If the observed rating is 3, and the leave-one-out posterior predicts 3 with probability p, then the associated log predictive density is \log(p)

1 Like

Thank you very much for your comment. I think this allowed me to manually approximate how loo would work for ordinal data.

# Load necessary libraries
library(brms)
library(loo)

# Load the mtcars dataset
data(mtcars)

# Convert the 'cyl' column to an ordered factor
mtcars$cyl <- ordered(mtcars$cyl)

# Fit a null model (no predictors) using the brms package
# The response variable is 'cyl', and the family is set to 'cumulative' for ordinal regression
null <- brm(cyl ~ 1, data = mtcars,
            family = cumulative(threshold = "flexible"),
            chains = 2, cores = 2, iter=2000)

# Fit a model with 'disp' as a predictor
disp <- brm(cyl ~ disp, data = mtcars,
            family = cumulative(threshold = "flexible"),
            chains = 2, cores = 2, iter=4000)


# Compute the Expected Log Predictive Density (ELPD) for each model using the loo package
null_loo <- loo(null)
disp_loo <- loo(disp)


# Manually compute the log predictive density for each observation for each model
# Extract the predicted probabilities for each observation from each model
null_probs <- posterior_epred(null)
disp_probs <- posterior_epred(disp)

# Average the predicted probabilities over the MCMC iterations
null_probs_avg <- apply(null_probs, c(2, 3), mean)
disp_probs_avg <- apply(disp_probs, c(2, 3), mean)

# Calculate the log of the predicted probability for the observed outcome for each observation
null_logp <- sapply(1:nrow(mtcars), function(i) log(null_probs_avg[i, mtcars$cyl[i]]))
disp_logp <- sapply(1:nrow(mtcars), function(i) log(disp_probs_avg[i, mtcars$cyl[i]]))

# Print the log predictive density for each observation for each model
print(null_logp)
print(disp_logp)

# Sum each
sum(null_logp)
sum(disp_logp)

We thus get these estimates from loo:

print(null_loo)

Computed from 2000 by 32 log-likelihood matrix

         Estimate  SE
elpd_loo    -36.1 1.6
p_loo         2.1 0.2
looic        72.2 3.2

print(disp_loo)

Computed from 4000 by 32 log-likelihood matrix

     Estimate  SE

elpd_loo -6.7 2.4
p_loo 2.1 1.1
looic 13.3 4.7

Whereas my method comes up with:

> # Sum each
> sum(null_logp)
[1] -33.967
> sum(disp_logp)
[1] -4.603159

Which of course is slightly lower, because it has actually “seen” each datapoint it is fit to in this comparison.

I hope the above is correct an that it may also help other to get a better intuition about what goes on with Loo behind the scenes for ordinal data.

Just to note a slight semantic issue: loo calls this an elpd, but for discrete outcomes it’s really a predictive mass, not a predictive density. Perhaps this was a source of some of your initial confusion.

2 Likes