Hi,
your output from:
clearly shows that model fit2 and fit3 are more or less equivalent concerning out of sample predictions. However, they are both much better than fit4. We can see that when using a z-score of 2.57 (99%) and comparing fit4 with fit2:
> -108.5 + c(-1,1) * 2.57 * 15.9
[1] -149.363 -67.637
since it doesn’t cross 0.
What worries me is that you had so many problematic observations. I’m not really sure reloo helps you in this case. I would love if @avehtari could provide some insights.