Your explanation clarified the background, and it’s good that you think it as a summary. LOO for each group corresponds to assuming new out of sample data is from the same groups. If you sum elpd’s together, you are assuming equal weight for all groups in the summary (you could have different weights if you would assume that some groups are more likely in the future). Now when you want consider whether the difference in the summary between hierarchical and non-hierarchical is big, there is uncertainty as the summaries for each group and the total summary for all groups is based on finite number of observations and finite number of groups. As you are assuming hierarchical structure of groups and individuals in groups, you could also use a hierarchical model to analyse the uncertainty in the full population mean. If the between group variation dominates, you could approximate the uncertainty in the summary by using the point estimates of elpds for each group and compute SE from those. Or if the within group variation dominates, you can pool all individual epld values from all groups to one vector and compute the variance from there.
Pareto-k is diagnostic for how variable the importance sampling weights are.
Nice thing about the cross-validation is that we don’t need to count parameters. loo package is reporting p_loo, but it’s not by counting but by comparing the full posterior predictive log score to cross-validation predictive log score.
When you leave more observations out, the posterior changes more and the full data posterior is not anymore a good importance sampling proposal distribution for the leave-one-row-out posterior. If 95% of Pareto-k’s are >1, it hints that you also have one parameter per row and when you remove all data that is directly influencing that parameter, its posterior changes a lot. In such cases K-fold-CV is the easiest option for reliable computation. Instead of cross-validation you can also look at the posterior of the hierarchical model parameters as that can also sometimes tell enough whether data is providing useful information about the hyperparameters and then you don’t need cross-validation (unless the prediction task is important)