In a future project, I plan to not use listwise deletion, but include some missings as parameters to be estimated.
I’ll figure the code out when the time comes, but I’m curious how to handle computing the log_lik statements needed for looic or other model fit metrics.
Generally, you just throw vector[N] log_lik into gen quantities, and compute the log-likelihood of each observation, throw it back into that vector.
But what do you do if you model missing data? Do you only compute log_lik for observed data, or do you include log_lik for missings as well?
Context: The model will be an SEM type of model, where inevitably, some people will not answer all scales’ responses. Generally, this number has been small enough that I can just dump the 2-3 cases that didn’t fully answer the scales, but I plan on including all /available/ responses into the model, then model missing responses by just constructing a full data matrix from observed and unobserved data, and running the model on that data matrix. I’ll want to compute some model fit stats using the joint likelihood of the data, but I have no clue whether to include missing, estimated observations into the log-likelihood estimates. Thoughts?
You include the log likelihood for missing data, too. That and the prior controls how it will be imputed. There’s a chapter in the manual on how to code up missing data in Stan.
I have no idea about LOO or what LOOIC is. Usually you wouldn’t compare missing observations under cross-validation, so maybe they don’t use them under LOO.
And we are talking missing data, not just latent parameters that go with the data? Such latent parameters need to be marginalized out in order to produce the usual notion of likelihood.
If you want to compute LOO, then in log_lik computation in generated quantities include only the observed values. If you would include missing values, then those terms would correspond to self predictive approach (Section 5.2.3 A survey of Bayesian predictive methods for model assessment, selection and comparison). Self predictive log densities are highest for the most narrow predictive distributions, so you could use that to examine how your alternative models differ, but the self predictive approach can be optimistic as it cares only about how narrow the distribution is and not about where the distribution is (but with missing data you don’t have the observation where to compare the location).
Perfect! That was my intuition (I didn’t think it would make sense to find leave-one-out approx error for unobserved variates), but your answer was very helpful.