LOO and reinforcement learning models

Hi! Sorry if this is a very basic question, this is my first modelling project :)

I am fitting reinforcement learning models for the Iowa Gambling Task using hBayesDM to real-world data. I followed the steps outlined in the hBayesDM docs and other articles stating that we should fit multiple models, then compare their performance to choose the “best” one and use it for group comparison of posterior mean parameters. However, I don’t want to choose the “best” model without knowing if it is a good model in the first place.

hBayesDM implements LOOIC and WAIC, and here is an output example of a test I ran:

          Model    LOOIC     WAIC LOOIC Weights  WAIC Weights
1       igt_orl 13843.60 13766.99  1.000000e+00  1.000000e+00
2 igt_pvl_delta 15184.34 15144.29 7.290459e-292 8.409713e-300
3 igt_pvl_decay 14874.28 14824.94 1.551807e-224 1.861820e-230

From this example, I would then select the “igt_orl” model. However, my pareto k diagnostic values are very high (nearly all >0.7), which would mean this model is not predicting my data well despite being the “best” model out of those 3.

From the research I did, it seems that loo assumes that trials are independent (which is not the case in reinforcement learning models), leading to high pareto k values. If that is correct, I am wondering if there is a better way to diagnose my models that does not assume trial independency?

Thank you!

Hello! You are right that this is not the best approach to compare models for this type of model/data. There have been a few conversations about this that might be informative, particularly this one: Loo for hierarchical model with trial-by-trial dependencies

In brief, you want to do cross validation, either leaving the next trial out (if you are concerned about the model generalizing to future trials) or leaving participants out (if you are concerned about generalizing to future participants).

Others here have more knowledge about this than I do, so they may chime in with more info, but this should at least get you started.

I looked briefly at the package you are using, but I’m not deeply familiar with it or reinforcement learning models, but it seems that you could benefit from reading this faq: Cross-validation FAQ • loo

It is not true that leave-one-out cross validation requires strict independence. It only require exchangeability. See this “When is cross-validation valid?” section of the faq.

This is a different question from whether the specific method of approximating LOO is working for your model. Since you mention pareto k diagnostic values, you seem to be using PSIS-LOO, which may fail even when exact LOO would work.

EDIT: To clarify, if you have high pareto k values, my understanding is that the approximation of PSIS-LOO to exact LOO may be unreliable. That’s what I mean by fail.

1 Like

Hi Vanessa, Hi Karsten,

Thank you both for your quick and very informative answers!

I read the documentation you mentioned, and from what I gather, K-fold-CV with joint log score might be more appropriate here (but please correct me if I’m wrong!).

In my case, I don’t think the data is exchangeable, as choices are driven by learning over time (i.e., choice at time t depends on choice and outcome at t-1 and all those before that), which would cause LOO to be biased too

Even if you don’t have complete exchangeability you may have conditional exchaneability, see e.g. Figure 1 in Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models.

Also, you should care about both bias and variance, and as in model comparison it is likely that the bias is similar for two models, the bias in the performance difference can be negligible and then reducing variance matters more. See, for example, Cross-validatory model selection for Bayesian autoregressions with exogenous regressors and [2504.15586] Joint leave-group-out cross-validation in Bayesian spatial models.