Is LOO valid for models with missing outcome when using a complete case dataset that is a subset of the original data?

martinmodrak · June 21, 2021, 5:16pm

Hi,
EDIT: should note first, that my understanding of cross validation and LOO is quite shallow, Aki (answering below) is the actual expert, so my advice is to be taken more carefully and with a grain of salt.

In most cases the data with missing outcome can be safely ignored when fitting the model (it appears the outcome is not used to impute anything else), as missing outcomes usually don’t provide any additional information for your models (while rows with missing predictors still can). See e.g. MICE missing values on the response for some discussion.

Just to be clear - you used that same argument as newdata when computing the loo for both models? Or just for the one with the additional missing predictor? If so, then I think it might be mostly OK, but I think you are not likely to get any strong guarantees - the observations with missing predictor (especially if there are many of them) may have important influence on your posterior that is getting lost when you ignore them. So I would treat the results at best as a quick heuristic. I think that you could in principle also calculate the log likelihood for the rows where the predictor is missing by integrating out the parameter representing the imputed value - this would however need to be done manually. (I did that once with a very different type of nuisance variables and the results looked sensible, but I didn’t do a deep investigation). You AFAIK cannot just use the likelihood given the imputed value as it is basically certain that the imputed value depends strongly on the particular observation for that row, breaking the assumptions loo makes and giving you large pareto k.

Best of luck with your model!

Topic		Replies	Views
Model comparison in latent variable models brms loo	1	1056	May 14, 2021
Model comparison for multiple imputation with brm_multiple Modeling loo , cross-validation , model-comparison , brms , missing-data	1	95	September 20, 2024
Error in LOO comparison because of different data points Modeling loo	3	700	November 29, 2020
Loo for a subset of the data General loo , brms	4	742	August 4, 2021
Using model comparison (loo or waic) after imputation brms loo	3	1109	October 27, 2021

Is LOO valid for models with missing outcome when using a complete case dataset that is a subset of the original data?

Related topics