WAIC and LOOCV for multivariate analysis with different distributions

Hi,

I’m running a multivariate factor analysis for both a binary variable (with logit link) and a continuous variable (with lognormal distribution) using RStan. I’m wondering how to calculate log-likelihood for such data in the generated quantities block so that I can use loo package to get the modelfit criteria? Could you help me with this?

Thank you!

Ivory

Hi @ZJL welcome to the Stan forums. Do you mean that your model has two outcomes, one binary and one continuous and you want to calculate the combined log-likelihood? If that’s the case then you should be able to just sum the log-likelihoods for the two outcomes. Let. me know if I misinterpreted your question though.

1 Like

Hi, thank you for your reply. Yes that is what I mean. I’ve already written a function to calculate WAIC but just curious, in this case, would loo package still work? It seems that it only deals with uni-variate outcomes.

I think the loo package should work fine with multivariate outcomes. The log-likelihood matrix, array, or function that you pass to loo should just contain the total log-likelihood (the sum of the component log-likelihoods). It won’t let you pass in the log-likelihoods separately for the two outcomes but you can just sum them before passing the log-likelihood to loo.

2 Likes

Got it. Thank you very much for your help!

No problem!

See also CV FAQ: 13. Is it a problem to mix discrete and continuous data types?

2 Likes

Thank you! It’s very helpful.

Hi,

I have a follow-up question that I hope you can help with. So, my data are from a testlet-based assessment, and let’s say there are three testlets, each testlet includes three items. One type of my data are at the item level, so there are 9(items)*N(sample size) data points. The other type of my data are at the testlet level, so there are 3(testlets)*N(sample size) data points. I formulated a joint model for the two types of data, and wanted to get the WAIC and LOOCV. However, because the two types of data are not strictly parallel, I don’t know how to sum the log-likelihoods. I’m wondering if there’s any instruction about how to calculate WAIC and LOOCV in this case? Should I sum one log-likelihood of the testlet-level data with three log-likelihoods of the item-level data at each testlet level, so that I can generate a matrix of log-likelihoods with dimension 3(testlets)*N(sample size) and then use the loo package to calculate the WAIC and LOOCV?
Thanks in advance!