Memory issues when computing LOO/WAIC

Hi.

I’ve stumbled upon this thread due to an error previously discussed here.

I’ve looked at the vignette you’ve mentioned, but feel like my case would be difficult to adapt for loo_subsample.

I’m working with a drift-diffusion model which has coefficients varying by participant and another set of coefficients varying by item. The data supplied to the Stan model is not rectangular, but JSON-like (for example, the participant-level data have N entries, while item-level data have M entries).

Looking at the vignette, my current understanding is that a data.frame-like object is to be supplied to the log-likelihood function so that it’s rows can be subset. If so, in my case, this would mean restructuring the data into a rectangular shape, because the simple row-indexing used for subsetting would not work with my current data set. This is rather inconvenient.

I was wondering whether it is possible to do the subsetting within the generated quantities block of a Stan model? For example, by calculating the log-likelihoods only for each N^{\textrm{th}} entry, and setting the others to NaN or something like that? And running the normal loo function afterwards?

I tried looking through the source code of the loo_subsample function, but there’s a lot going on, and I didn’t feel comfortable determining what’s going on, so am not able to tell whether this approach is too naive.

Just for context, I’m working with a data set with some 120 k observations, obtained from some 100 participants for approximately 5000 items. I tried running the LOO calculation on a supercomputer I have access to, working with 1 TB of RAM. This setup produced the error mentioned in the post linked at the beginning.