LOO-CV for joint models

Hi all,

I have been playing around with a joint model for two variables (Y, D), where the likelihood function can be written in the form f(Y | D)P(D), where Y is say Gaussian and D is say Bernoulli. I would like to use LOO-CV to assess model fit, which I can do by exporting the log-likelihood terms in the usual way and using the loo package in R.

However, the likelihood only factorises on the joint r.v.s (y_i, d_i), and so I figured I need to export the joint log-likelihood for each individual and use these for the loo calculations e.g.

log_lik[i] = normal_lpdf(Y[i] | muy[i], sigmay[i]) + bernoulli_lpmf(D[i] | pd[i]);

(here muy[i] and sigmay[i] are functions of D[i]).

However, to assess model fit it makes sense to compare to the data for Y against the predictions from Y | D e.g.

loos <- loo(log_lik, save_psis = TRUE)
p1 <- ppc_loo_pit_qq(
  y = data$Y,
  yrep = ypred,
  lw = weights(loos$psis_object)
)

where ypred here is a matrix of posterior predictive samples from Y | D.

My question is whether this is the correct log-likelihood to be using, or whether since I am generating conditional predictions I should be using the conditional log-likelihood e.g.

log_lik[i] = normal_lpdf(Y[i] | muy[i], sigmay[i]);

in the loo calculations? Any advice would be gratefully received.

Many thanks,

TJ

1 Like

is D observed or latent?

Hi. It’s observed here. There is a hierarchical latent variable U that correlates D with Y, but that’s been analytically marginalised out.

1 Like

There are at least 4 different prediction tasks that you might be interested in.

  1. Predict D given covariates.
  2. Predict Y given D and covariates.
  3. Predict Y given only covariates–i.e. how well can you predict Y if you don’t have knowledge of D.
  4. You might also be interested in some combination of 1 & 2 or of 1 & 3.

Here are the log-likelihoods you need to form for each of these cases:

  1. Form the log likelihood for D conditional on the covaraites.
  2. Form the log likelihood for Y conditional on the observed D and the covariates.
  3. Form the log likelihood for Y while marginalizing over the likely values of D predicted by the covariates.
  4. This is more complicated, because you need to make a decision about how strongly to weight likelihood contributions from Y versus D. There is no obvious choice dictated by the model structure, because these likelihoods have different units. The \mathcal{L}(Y) is in units of probability density; \mathcal{L}(D) is in units of probability mass. In particular, there is no reason to suspect (and it is unlikely to be the case) that simply summing the numerical value of the lpdf for Y and the numerical value of the lpmf for D would yield a sensible weighting. If this happens to be true, it’s just a coincidence.
2 Likes

Thank you.

I was thinking about 1 & 2, because I wanted to assess the fit of both components of the model. So I think I may need two different likelihood outputs, one for the marginal D and one for the conditional Y | D, rather than the joint density or some kind of weighted density.

Thanks for your help. Plenty to think about here. Much appreciated.

2 Likes

@tjmckinley, your thinking on leaving out the whole observation i, but looking at LOO-PIT for y is correct. The joint log-likelihood is needed to remove the the joint contribution of y_i, d_i, but it is fine to look at the calibration separately for Y and D. For D, you could use reliabilitydiag package as shown, e.g., in Recommendations for visual predictive checks in Bayesian workflow

Thanks @avehtari, I had actually started to play around with the reliabilitydiag package after reading your recommendations article on Friday, since my go-to approach would have been to use the more traditional calibration curves. Seems like a neat method. It was actually during that process that I began to question my logic on how to structure the correct log-likelihood for LOO-CV in this case.

So if I understand correctly, if I export the joint log-likelihood for \left(y_i, d_i\right) from the Stan code then it is permissible to use this for generating the marginal LOO-PIT plot for Y and the marginal reliability curve for D. Hence I would not have to generate the marginal log-likelihoods for y_i and d_i separately for the two different comparisons (which I think is the suggestion from @jsocolar in 1) and 3) above)? I can convince myself on either of these approaches depending on how I think about the problem…

If it is OK to use the joint log-likelihood, then will this also hold if we want the LOO-PIT for the conditional Y \mid D (i.e. I wouldn’t have to export the conditional log-likelihood for y_i \mid d_i?

This makes sense to me because the other thing I wanted to explore was the use of posterior predictive stacking weights for different models, and it made sense to me to use the joint log-likelihood contributions for the LOO-CV weightings in this case also.

Thanks @jsocolar for highlighting the weighting issues between continuous and discrete densities. I can see how this matters if one wants to fit say a discrete or a continuous model on D say and then compare between them (as per the discussion here: Cross-validation FAQ • loo), but does it matter if one is targeting a joint likelihood that one component is continuous and one discrete, as long as the log-likelihood is correct and the variable types do not change between models?

Thanks both for your help.

Yes.

To marginalize out D, you need for each i observe both possible values for D and you would then average over these two values. To marginalize out continuous Y is much more difficult. This doesn’t change how the log_lik is computed.

For the conditional, just look at the pointwise predictions for the subset with d_i having the desired value. This doesn’t change how the log_lik is computed.

The log-likelihood is computed in order to be able to remove the contribution of one observation from the posterior. Transformations of the continuous variable affect both the posterior and the information in one observation to be removed correctly. Transformations of continuous variable do affect the log-predictive density, while transformations of discrete variable do not affect log-predictive probability. Thus, transformations of continuous variables can affect the relative weight of the continuous model part in elpd.

2 Likes

Perfect, thanks @avehtari. That’s what I had thought originally, before I started to second-guess myself. Thanks both for your responses, it’s much appreciated.

2 Likes