I am fitting several large models and I would like to compare fit for two of them to see whether a simplifying assumption (substituting (0 + item_type|id) with (1|id) is defensible. I’m currently only working with a subset of the data, but the subset models are already 12GB on disk (many crossed random effects), the final models will use 10x the data.
I thought I’d try loo_subsample because fitting the models is already getting close to the limits of our cluster. However, it seemed very slow even on an even smaller model. So, I wanted to know whether I’m using the function correctly and wrote a small reprex.
To my surprise, loo was faster than loo_subsample. Is this expected with a small model or could there a be a problem that perhaps also explains why I can’t get loo_subsample to work in a reasonable amount of time (ran for more than 24hours without finishing) with my real model?
Hmm. Yes, this is where loo subsample should work much better. Im happy to help to find out what the problem is. Can you see what takes time tu run with an R profiler?
It can be slower for small samples because it computes an auxiliary variable and can have some constants.
I just made the inhaler dataset bigger.
loo_subsample is still slower, by a large margin (8s vs 58s), but the slowdown comes from the second computation, where I want to draw the same observations. The first loo_subsample is faster than the first loo. Most of the time is spent in r_eff. With my real model it’s already slow for the first fit (I can’t profile it, because it doesn’t finish), so I’m not sure this is the root cause of my problem.
Im not going to be able to reproduce this in the next couple of days, but it surely looks that is a bug that is not really part of loo subsample but r_eff. @jonah do you see anything obvious here?
Thanks for profiling! Not helping you right now, but here is another reason we should drop computing r_eff by default. r_eff has been used to compute MCSE for elpd, but as elpd is sum of many pointwise elpds, MCSE is usually negligible and estimating it more accurately doesn’t matter. Ping @jonah and @paul.buerkner
The loo package doesn’t compute r_eff by default, but I think brms, rstanarm, and cmdstanr do. So we could make changes to those packages to stop computing it automatically.
and interrupted after 3 minutes. This confirmed my suspicion that the problem has to do with the number of random effects, i.e. when I go deeper in r_eff, it’s predictor_re that is taking most time.
FWIW I let it finish running on the aforementioned model, it took 36 minutes. Not only r_eff but also loo_subsample through log_lik spends significant time (15m) on predictor_re / .subscript.2ary
I think in your case you’d need an update to brms, unless there’s currently a way to tell brms not to compute r_eff. Even if you’re using cmdstanr with brms when you call add_criterion it’s not using cmdstanr’s loo method, brms implements its own.
Makes sense. Also, it seems the root cause of the slowness is predictor_re which loo_subsample also calls. So, getting rid of r_eff might cut runtime in half, but that wouldn’t fix my problem that it takes days to run on my real model.
Thanks, that’s really helpful. predictor_re seems to be an internal function in brms that I’m not familiar with (so maybe @paul.buerkner will have some ideas). I think only the loo_subsample method for brmsfit objects calls this function. It’s not part of the implementation in the loo package itself.