Memory issues when computing LOO/WAIC


Like several other people on this forum I am running into memory issues when trying to compute LOO/WAIC for large datasets. My data has roughly 160,000 observations. The model has three grouping levels, one with 15 categories, one with 16 categories, and one with 240 (15 * 16) categories, along with 5-30 population-level parameters. I am running the model with the usual four chains with 2,000 iterations each and discard 1,000 as burn-in. I am trying to compute LOO/WAIC for model comparison on a machine with 50 Gb memory. Given that I consistently run out of memory before the calculation is done, my question is this: Is there a rule of thumb to calculate how much memory I would need to compute these quantities? I could try to find a machine with more memory. Alternatively, is there a way to calculate the quantities on a subset of the samples? This option seems a little more hackish, but I’m running out of ideas. Any help would be greatly appreciated.

Here’s a vignette showing how to calculate loo for large data: Using Leave-one-out cross-validation for large data • loo


@ssp3nc3r already linked to the vignette (which also lists the papers showing that properly made subset LOO is not justified and not hacky)

How many observations you have per group (with unique category combination)? If that is large, and the posterior is not changing much when you leave out just one observation out of 160,000, then it is possible you don’t even need LOO (or WAIC). On average, you have 531 observations per parameter, which is a lot, and 666 observations per each of the 240 categories is a lot, too, but then it matters if some categories have much fewer observations.

1 Like