Loo_subsample slower than loo?

ruben · May 22, 2025, 12:13pm

I am fitting several large models and I would like to compare fit for two of them to see whether a simplifying assumption (substituting (0 + item_type|id) with (1|id) is defensible. I’m currently only working with a subset of the data, but the subset models are already 12GB on disk (many crossed random effects), the final models will use 10x the data.

I thought I’d try loo_subsample because fitting the models is already getting close to the limits of our cluster. However, it seemed very slow even on an even smaller model. So, I wanted to know whether I’m using the function correctly and wrote a small reprex.

To my surprise, loo was faster than loo_subsample. Is this expected with a small model or could there a be a problem that perhaps also explains why I can’t get loo_subsample to work in a reasonable amount of time (ran for more than 24hours without finishing) with my real model?

Hope it’s OK to tag you @mans_magnusson :-)

library(brms)

options(mc.cores = 4, brms.backend="cmdstanr")
fit1 <- brm(rating ~ treat + period + carry,
            data = inhaler)

fit2 <- brm(rating ~ treat + period + carry + (1|subject),
            data = inhaler)

options(mc.cores = 1, brms.backend="cmdstanr")

system.time({
  fit1l <- add_criterion(fit1, criterion = "loo", overwrite = T, ndraws = 500)
  fit2l <- add_criterion(fit2, criterion = "loo", overwrite = T, ndraws = 500)
})
# takes 0.423s
loo_compare(fit1l, fit2l)

system.time({
  fit1ls <- add_criterion(fit1, criterion = "loo_subsample", observations = 100, ndraws = 500, overwrite = T)
  fit2ls <- add_criterion(fit2, criterion = "loo_subsample", observations = fit1ls$criteria$loo_subsample, ndraws = 500, overwrite = T)
})
# takes 0.921 s

loo_subsample(fit1ls, fit2ls, compare = TRUE)

Operating System: macOS 15.4.1
brms Version: brms_2.21.0
loo Version: 2.8.0

mans_magnusson · May 22, 2025, 12:44pm

Hmm. Yes, this is where loo subsample should work much better. Im happy to help to find out what the problem is. Can you see what takes time tu run with an R profiler?

It can be slower for small samples because it computes an auxiliary variable and can have some constants.

ruben · May 22, 2025, 3:58pm

I just made the inhaler dataset bigger.
loo_subsample is still slower, by a large margin (8s vs 58s), but the slowdown comes from the second computation, where I want to draw the same observations. The first loo_subsample is faster than the first loo. Most of the time is spent in r_eff. With my real model it’s already slow for the first fit (I can’t profile it, because it doesn’t finish), so I’m not sure this is the root cause of my problem.

library(brms)

options(mc.cores = 4, brms.backend="cmdstanr")
max_s_id <- max(inhaler$subject, na.rm = TRUE)

enlarged_inhaler <- purrr::map_dfr(
  .x = 0:19, # Creates 20 versions (0=original, 1=first new copy, ..., 19=last new copy)
  .f = ~ dplyr::mutate(inhaler, subject = subject + (.x * max_s_id))
)

fit1 <- brm(rating ~ treat + period + carry,
            data = enlarged_inhaler)

fit2 <- brm(rating ~ treat + period + carry + (1|subject),
            data = enlarged_inhaler)


options(mc.cores = 1, brms.backend="cmdstanr")

system.time({
  fit1l <- add_criterion(fit1, criterion = "loo", overwrite = T, ndraws = 500)
  fit2l <- add_criterion(fit2, criterion = "loo", overwrite = T, ndraws = 500)
})

loo_compare(fit1l, fit2l)

system.time({
  fit1ls <- add_criterion(fit1, criterion = "loo_subsample", observations = 100, ndraws = 500, overwrite = T)
  fit2ls <- add_criterion(fit2, criterion = "loo_subsample", observations = fit1ls$criteria$loo_subsample, ndraws = 500, overwrite = T)
})

profvis::profvis({
  fit1ls <- add_criterion(fit1, criterion = "loo_subsample", observations = 100, ndraws = 500, overwrite = T)
})

loo_subsample(fit1ls, fit2ls, compare = TRUE)

mans_magnusson · May 22, 2025, 4:42pm

Great,

Im not going to be able to reproduce this in the next couple of days, but it surely looks that is a bug that is not really part of loo subsample but r_eff. @jonah do you see anything obvious here?

avehtari · May 22, 2025, 7:23pm

Thanks for profiling! Not helping you right now, but here is another reason we should drop computing r_eff by default. r_eff has been used to compute MCSE for elpd, but as elpd is sum of many pointwise elpds, MCSE is usually negligible and estimating it more accurately doesn’t matter. Ping @jonah and @paul.buerkner

jonah · May 22, 2025, 7:35pm

The loo package doesn’t compute r_eff by default, but I think brms, rstanarm, and cmdstanr do. So we could make changes to those packages to stop computing it automatically.

jonah · May 22, 2025, 11:04pm

I opened PRs in rstanarm and cmdstanr to change this.

ruben · May 23, 2025, 10:38am

Does it make sense for me to install these PRs and try on my real data?

I was curious what exactly is taking so long. I had a model on my real data that is a lot smaller

Family: bernoulli
Links: mu = logit
Formula: response_is_correct ~ 1 + (1 | item_id) + (1 | user_id) + (1 | target_id)
Data: red_yes_no (Number of observations: 281690)

I ran

profvis::profvis({
  mod <- add_criterion(red_m_correct_main_yes_no, criterion = "loo_subsample", observations = 10, ndraws = 10)
})

and interrupted after 3 minutes. This confirmed my suspicion that the problem has to do with the number of random effects, i.e. when I go deeper in r_eff, it’s predictor_re that is taking most time.

ruben · May 23, 2025, 11:30am

FWIW I let it finish running on the aforementioned model, it took 36 minutes. Not only r_eff but also loo_subsample through log_lik spends significant time (15m) on predictor_re / .subscript.2ary

jonah · May 23, 2025, 2:17pm

I think in your case you’d need an update to brms, unless there’s currently a way to tell brms not to compute r_eff. Even if you’re using cmdstanr with brms when you call add_criterion it’s not using cmdstanr’s loo method, brms implements its own.

ruben · May 23, 2025, 2:38pm

Makes sense. Also, it seems the root cause of the slowness is predictor_re which loo_subsample also calls. So, getting rid of r_eff might cut runtime in half, but that wouldn’t fix my problem that it takes days to run on my real model.

jonah · May 23, 2025, 2:47pm

Thanks, that’s really helpful. predictor_re seems to be an internal function in brms that I’m not familiar with (so maybe @paul.buerkner will have some ideas). I think only the loo_subsample method for brmsfit objects calls this function. It’s not part of the implementation in the loo package itself.

ruben · May 26, 2025, 3:59pm

I looked into a bit more. The brms:::loo_subsample.brmsfit function is only a thin veneer over loo::loo_subsample. If I understand correctly (probably not), then loo::loo_subsample is trying to get pointwise log_liks for the brmsfit and this leads to log_lik_pointwise being called, which uses get_dpar, which is what is taking so long.
Is it possible to subscript the matrix only once and store the result, so that the lookup becomes cheaper?

paul.buerkner · May 27, 2025, 7:48am

Perhaps this could be a good issue for brms github? I assume I will have to make some adjustments in the brms codebase for this purpose.

ruben · May 27, 2025, 8:45am

Of course. Here’s my short summary of what I think we know.

Topic		Replies	Views
How to speed up `brms::loo_subsample()` for large models brms loo , hierarchical-model , model-comparison	16	1572	October 25, 2022
Memory requirements to run loo.brmsfit brms	9	1075	November 14, 2023
Problem running loo_subsample with variational inference and cmdstanr Modeling cmdstan , variational-bayes , loo , cmdstanr	19	870	May 15, 2021
Loo_subsample error for ordinal brms model brms loo , ordinal-response , brms	0	445	August 28, 2022
Loo for a subset of the data General loo , brms	4	749	August 4, 2021

Loo_subsample slower than loo?

Related topics