Is LOO valid for models with missing outcome when using a complete case dataset that is a subset of the original data?

Hi,
EDIT: should note first, that my understanding of cross validation and LOO is quite shallow, Aki (answering below) is the actual expert, so my advice is to be taken more carefully and with a grain of salt.

In most cases the data with missing outcome can be safely ignored when fitting the model (it appears the outcome is not used to impute anything else), as missing outcomes usually don’t provide any additional information for your models (while rows with missing predictors still can). See e.g. MICE missing values on the response for some discussion.

Just to be clear - you used that same argument as newdata when computing the loo for both models? Or just for the one with the additional missing predictor? If so, then I think it might be mostly OK, but I think you are not likely to get any strong guarantees - the observations with missing predictor (especially if there are many of them) may have important influence on your posterior that is getting lost when you ignore them. So I would treat the results at best as a quick heuristic. I think that you could in principle also calculate the log likelihood for the rows where the predictor is missing by integrating out the parameter representing the imputed value - this would however need to be done manually. (I did that once with a very different type of nuisance variables and the results looked sensible, but I didn’t do a deep investigation). You AFAIK cannot just use the likelihood given the imputed value as it is basically certain that the imputed value depends strongly on the particular observation for that row, breaking the assumptions loo makes and giving you large pareto k.

Best of luck with your model!

3 Likes