LOO and reinforcement learning models

CamilleGrande · April 7, 2026, 1:45pm

Hi! Sorry if this is a very basic question, this is my first modelling project :)

I am fitting reinforcement learning models for the Iowa Gambling Task using hBayesDM to real-world data. I followed the steps outlined in the hBayesDM docs and other articles stating that we should fit multiple models, then compare their performance to choose the “best” one and use it for group comparison of posterior mean parameters. However, I don’t want to choose the “best” model without knowing if it is a good model in the first place.

hBayesDM implements LOOIC and WAIC, and here is an output example of a test I ran:

          Model    LOOIC     WAIC LOOIC Weights  WAIC Weights
1       igt_orl 13843.60 13766.99  1.000000e+00  1.000000e+00
2 igt_pvl_delta 15184.34 15144.29 7.290459e-292 8.409713e-300
3 igt_pvl_decay 14874.28 14824.94 1.551807e-224 1.861820e-230

From this example, I would then select the “igt_orl” model. However, my pareto k diagnostic values are very high (nearly all >0.7), which would mean this model is not predicting my data well despite being the “best” model out of those 3.

From the research I did, it seems that loo assumes that trials are independent (which is not the case in reinforcement learning models), leading to high pareto k values. If that is correct, I am wondering if there is a better way to diagnose my models that does not assume trial independency?

Thank you!

Vanessa_Brown · April 7, 2026, 2:30pm

Hello! You are right that this is not the best approach to compare models for this type of model/data. There have been a few conversations about this that might be informative, particularly this one: Loo for hierarchical model with trial-by-trial dependencies

In brief, you want to do cross validation, either leaving the next trial out (if you are concerned about the model generalizing to future trials) or leaving participants out (if you are concerned about generalizing to future participants).

Others here have more knowledge about this than I do, so they may chime in with more info, but this should at least get you started.

kaskogsholm · April 7, 2026, 2:35pm

I looked briefly at the package you are using, but I’m not deeply familiar with it or reinforcement learning models, but it seems that you could benefit from reading this faq: Cross-validation FAQ • loo

It is not true that leave-one-out cross validation requires strict independence. It only require exchangeability. See this “When is cross-validation valid?” section of the faq.

This is a different question from whether the specific method of approximating LOO is working for your model. Since you mention pareto k diagnostic values, you seem to be using PSIS-LOO, which may fail even when exact LOO would work.

EDIT: To clarify, if you have high pareto k values, my understanding is that the approximation of PSIS-LOO to exact LOO may be unreliable. That’s what I mean by fail.

CamilleGrande · April 7, 2026, 5:33pm

Hi Vanessa, Hi Karsten,

Thank you both for your quick and very informative answers!

I read the documentation you mentioned, and from what I gather, K-fold-CV with joint log score might be more appropriate here (but please correct me if I’m wrong!).

In my case, I don’t think the data is exchangeable, as choices are driven by learning over time (i.e., choice at time t depends on choice and outcome at t-1 and all those before that), which would cause LOO to be biased too

avehtari · April 9, 2026, 11:24am

Even if you don’t have complete exchangeability you may have conditional exchaneability, see e.g. Figure 1 in Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models.

Also, you should care about both bias and variance, and as in model comparison it is likely that the bias is similar for two models, the bias in the performance difference can be negligible and then reducing variance matters more. See, for example, Cross-validatory model selection for Bayesian autoregressions with exogenous regressors and [2504.15586] Joint leave-group-out cross-validation in Bayesian spatial models.

Topic		Replies	Views
Loo for hierarchical model with trial-by-trial dependencies Modeling loo	8	229	April 2, 2025
Two-armed bandit hierarchical reinforcement learning model - interpreting conflicting loo and posterior predictive check results Modeling specification , loo , posterior-predictive , hierarchical-model , reinforcement-learning	7	756	January 17, 2024
High Pareto-k values for the same observations across different models: Can I still use loo to compare these models? Modeling loo	2	649	November 5, 2018
Reinforcement learning model Modeling loo , cognitive-science	18	5430	May 18, 2019
Fitting the true model and getting bad k-pareto from loo Modeling	1	649	October 11, 2021

LOO and reinforcement learning models

Related topics