LOO Continuously Ranked Probability Score

I am comparing 2 models using the LOO implementation of continuously ranked probability scores. I just want to ensure I am interpreting the results correctly. It seems in most circumstances in applied research the CRPS is > 0 with lower values equating to a better model. At least most of the literature on the subject seems to say as much. Yet, both the example here and my results (Fit 1 = -0.304 and Fit 2 = -0.298) are negative.

Am I right to assume that these are the average differences between the cdf of the fitted model compared to the observations and are on the same scale as the observations? In other words, technically Fit 2 is “better” - although given the SE of the estimate, they are practically the same.

Pinging @avehtari given his work with loo.

We have followed the convention in [1912.05642] Local scale invariance and robustness of proper scoring rules, with higher value being better, as we have also used log score defined as log(p(.)). At the moment, we are missing loo_compare() support for CRPS, which would compute the correct SE for the comparison (now the SE reported is for each estimate separately). There is an issue for loo_compare() support loo_compare for crps and loo_crps · Issue #220 · stan-dev/loo · GitHub, but not yet anyone with time to code it. If we see more wishes for this, we will eventually prioritize it more.

1 Like