Understanding `loo` results for non-standard `Contaminated Controls` model

jsocolar · December 2, 2021, 3:23pm

I don’t have time to really understand the model in depth right now :(

But I think I can help with a few of these questions:

So the question here is how many parameters are in your model and how many data points you are fitting. By comparing these two numbers to p_loo and to each other, you should be able to figure out which regime you are in. If you’re using rstan, then rstan::get_num_upars will help you find how many parameters are in the model.

I don’t understand exactly what the model is doing, but in general I would suggest trying to write down a model that captures the true data-generating process as nearly as possible. I think that the contamination layer is part of the assumed true data-generating process that generates the data that the model sees. If it’s not, then I don’t understand the motivation for including it.
Note that loo prioritizes good predictive performance, which is not the same thing as capturing the true data-generating process. With finite data and a complicated DGP, the true DGP might be impossible to fit (i.e. attempting to fit the true DGP might be prone to non-identification and very sensitive to the priors), and might yield really bad predictive performance.
Good predictive performance is not the same thing as accurate uncertainty quantification around causal effect sizes of interest. If your goal is inference about effect sizes and you believe a priori that some particular feature of the model, or some nuisance covariate, is important, then you might choose to leave that model feature in your final model even if loo says that doing so worsens predictive performance.

Perhaps @avehtari will chime in about whether this is surprising or not, but I don’t know of any reason to be surprised. For what it’s worth, the loo package documentation says:

the PSIS effective sample size estimate will be over-optimistic when the estimate of 𝑘k is greater than 0.7 .

Topic		Replies	Views
Alternative to LOO for simulation studies Modeling loo , posterior-predictive , model-comparison	14	1170	November 17, 2020
Loo issues on simulated meta-analysis data Modeling loo , meta-analysis	13	1071	May 6, 2019
Bernoulli model with many LOO-PIT values at 1 Modeling loo	2	707	November 13, 2020
Various questions about interpretation of loo results General loo , interpret-results	2	1404	August 1, 2019
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	79	March 27, 2025

Understanding `loo` results for non-standard `Contaminated Controls` model

Related topics