Hi.
I’ve stumbled upon this thread due to an error previously discussed here.
I’ve looked at the vignette you’ve mentioned, but feel like my case would be difficult to adapt for loo_subsample.
I’m working with a drift-diffusion model which has coefficients varying by participant and another set of coefficients varying by item. The data supplied to the Stan model is not rectangular, but JSON-like (for example, the participant-level data have N entries, while item-level data have M entries).
Looking at the vignette, my current understanding is that a data.frame-like object is to be supplied to the log-likelihood function so that it’s rows can be subset. If so, in my case, this would mean restructuring the data into a rectangular shape, because the simple row-indexing used for subsetting would not work with my current data set. This is rather inconvenient.
I was wondering whether it is possible to do the subsetting within the generated quantities block of a Stan model? For example, by calculating the log-likelihoods only for each N^{\textrm{th}} entry, and setting the others to NaN or something like that? And running the normal loo function afterwards?
I tried looking through the source code of the loo_subsample function, but there’s a lot going on, and I didn’t feel comfortable determining what’s going on, so am not able to tell whether this approach is too naive.
Just for context, I’m working with a data set with some 120 k observations, obtained from some 100 participants for approximately 5000 items. I tried running the LOO calculation on a supercomputer I have access to, working with 1 TB of RAM. This setup produced the error mentioned in the post linked at the beginning.