I would appreciate being able to get frequentist cross-validated estimates of some (potentially arbitrary) loss. The use case is essentially reporting models to an audience that is uncomfortable with Bayes (or just ELPD) but might still get excited about things like horseshoe regression.
I brought this up on Twitter and @avehtari suggested following up here (cc @anon75146577 as well). It looks like people have some tricks that work at the moment, but an official Stan solution might be nice.
Can you elaborate what you mean by “frequentist cross-validated estimates”?
Sorry, wrote that question late at night. I think what I’m really asking for is the ability to use other loss functions than ELPD (currently I’m thinking of ELPD as a loss, please let me know if this is a bad idea).
My understanding of what
loo currently does:
Given a model fit on full data, gets efficient estimates of leave-one-out posterior predictive distributions. For k-fold cross-validation, there isn’t an efficient approximation to the posterior predictive, but
loo provides infrastructure for refitting models from scratch on the appropriate folds.
For each held-out data point/fold, evaluates the expected log posterior density of that point/fold using the posterior predictive distribution.
It seems like it would be easy to report RMSE/MAE and some other standard losses based on point estimates from the posterior predictive distribution and the held-out labeled data. This would be nice since ELPD isn’t as intuitive to me as RMSE and friends.
Ok. Loss and utility functions are independent from the inference method (e.g. frequentist or Bayesian). ELPD is utility as higher is better. If you prefer to minimize loss, you can use expected negative log predictive density (ENLPD), which is just -ELPD.
Correct, although you can also get other useful information.
Yes, it’s easy. See related examples
As it’s this easy, there hasn’t been pressure to make it even easier, but I agree it would be useful and there is now an issue for this.
We haven’t been in a hurry to add RMSE as it is much weaker to detect model differences and ignores whether the uncertainty in the predictions is modelled well, but I agree that it can be useful to give some scale for the goodness of a single model.
If you like, you can list here or in that issue what you think are “standard losses” in addition of RMSE/MAE/R^2