LOO and bayes_R2 (seem to) contradict posterior predictive check

Hi @hector -

Thanks for sharing your code and for being thoughtful about models! A few points to consider:

  1. The PPC pred plot for ordbetareg is not correct. I discuss it in a different post on here Ordered Beta Regression model: is a custom posterior predictive check plot necessary? - #10 by saudiwin. I will be adding a correct PPC plot to the ordbetareg package in the next update, hopefully soon but of course depends on my time availability :).

  2. As far as R$^2$ goes, it’s not really supposed to be a model selection criterion. It’s also based, at least indirectly, on estimating the mean or average, which OLS is always going to be very good at. But in any case, R$^2$ is supposed to give you some general info about model fit, and perhaps to compare different linear model specifications but LOO is really more for this. So,

  3. The LOO comparison is obviously helpful, but again is something that should be interpreted with caution. The LOO papers (re: @avehtari ) start with the assumption that we don’t know the correct model. If you have a lot of background information to believe that Beta regression should be optimal–which I think is entirely plausible, as I cover in my paper–then you shouldn’t rely on LOO. When I simulate data from the ordbetareg distribution, LOO does not always identify the correct model vis-a-vis OLS.

The reason for this has to do with the number of observations near the boundaries. The more of these that exist, and the more skewed the data is, the worse fit OLS will give you. But, it’s not really a wise idea to just pick based on some sample characteristics. As I show in my blog post What To Do (And Not to Do) with Modeling Proportions/Fractional Outcomes | Robert Kubinec, the data can be almost Normal and yet OLS can still have weird predictions that violate bounds, etc.

To sum up, a relatively small difference in LOO (the CIs are close to overlapping) should be a small concern when you have significant prior knowledge to know that you have a bounded outcome. OLS will not respect bounds, and LOO is a relatively crude criterion to make really fine-grained distinctions between models. The PPC plot shows mis-fit but that’s an issue with the plot, not the model, actually.

I hope that is helpful to you!