Loo+rstanarm for hierarchical models


Looking at the new rstanarm & loo package I am really pleased to see that we can now run k-fold cross-validation with the folds chosen by the user. I am interested in hierarchical models and with those I would like stratify the k-folds by the units the hierarchical model is wrt (the patients in my case).

Now, my question: Can I also apply the loo evaluation for rstanarm models on the patient level? So instead of definig log_lik per data point I would like to sum together the log_lik values for each patient and use that. Doing it on the data-point level answers the question “Do we fit observed data from patients well?” while I am also interested in “Do we fit patients well?”.

One can probably do this with some manual work, I suppose… but can rstanarm “just do it” with some cool feature (I am asking for a lot, I know).

Bonus question: If my patients have different number of observations will this be a problem in whatever regard?

Thanks! Looking forward to use this toolset.


Please also provide the following information in addition to your question:

  • Operating System:
  • rstanarm Version:

You can’t get this automatically (yet). You can easily get pointwise log_lik from rstanarm object, sum the terms you want (this part requires writing some code yourself, but it should be max 3 lines), and then you can call loo with your new log_lik matrix. It is likely that Pareto k’s will be large, but if you test this I’m very happy to learn the results.

There are some ideas how to make this better for hierarchical models (see, e.g. Rabe-Hesketh and Furr talk in the last StanCon https://github.com/stan-dev/stancon_talks/blob/master/README.md#2018-invited-talks), but it will take some time to include these in loo.

No. You may want to divide the final patientwise log-score’s by the number of observations per patient, if you want that each patient is as important (and not that patients which have more observations are more important).