That information has been updated in a paper
- Tuomas Sivula, Måns Magnusson, and Aki Vehtari (2020). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. arXiv preprint arXiv:2008.10296
which is also listed as a reference in CV-FAQ 15: How to interpret Standard error (SE) of elpd difference (elpd_diff)
You may also benefit from discussion in another thread
You can ignore this, it’s just for those who want to reproduce the experiments in the Bayesian stacking paper.
If you would make this named list, you could name your models and names would show in the weight outputs
When there are many models the weights are easier. If you have two nested models there is a monotonic mapping between the weights and probabilities (more about this coming soonish).
Models 3 and 6 are best, but you can get better predictions by averaging predictions from models 3, 5, and 6.
When models are similar LOO-BB-weights dilute the weights between models having similar predictive performance. I would guess that models 1, 2, 4, 7, and 8 are somehow similar with each other or with the models 3, 5, or 6. When models are similar stacking weights choose from the very similar predictions the best, but averages over different predictive distributions if none of the predictive distributions is the true data generating distribution. So in this case models 3, 5, and 6 are making different kinds of predictions and it can useful to check how they differ. You may also consider Bayesian hierarchical stacking.
Not without additional information about the models, If they are nested, choose the most complex one.