I am trying to compare two models using the loo package.
Number of data points = 600,000, post-warmup iterations = 2000, # chains = 10
To compute log likelihood from all samples, I need a matrix of size 600K x 20K. This would take very long time and memory.
Any recommendations to make this more efficient?
Can I only use a small number of iterations instead of all 2000? any other suggestions?
Ben’s suggestion is good, too. Here are couple other suggestions.
Take a smaller (whatever is fast enough for you) random sample of data points, compute log likelihood for those, compute elpd_loo for this smaller random sample and use the usual statistical inference to estimate what would be elpd_loo for the whole n=600K. We use this kind of approach succesfully in projpred to speed-up computation in case of large n.
You can also use less iterations (for example by thinning), but check N_eff and I recommend having N_eff>1000 for PSIS-LOO.
Thanks @bgoodri and @avehtari.
Without rerunning the model with less iterations, can I get a smaller random sample of posterior draws and estimate the loglikelihood on the full data set and compare two models?