Loocv in the factor analysis setting

dkaplan · March 5, 2022, 8:03pm

Hi all,

I’m trying to wrap my head around something. I’m used to using the loocv in the regression situation where I am interested in predicting a single outcome variable, say y which is N x 1, conditional on X’s arranged in an N x p matrix with p predictors. This is straightforward to understand as the elpd is calculated based on each y_(-i). But in the case of a method like factor analysis, there is no “outcome” per say and one has a matrix of variables y, say to be factor analyzed. In this case, I am not sure what is being “left out”. How is the elpd being calculated in this case?

Thanks in advance

David

andrjohns · March 6, 2022, 9:21am

Factor analysis is essentially just a regression model, where the observed variable is the outcome, the latent variable is the covariate, and the factor loading is the regression coefficient:

y_i = \lambda\eta_i + \epsilon_i^2

In the LOO context, it’s easiest to work with the parameterisation that marginalises out the latent variable for each individual, so that no individual-specific parameters are needed.

Given a model with p outcomes and k latent variables, having:

p \text{ x } 1 Intercept vector \nu
p \text{ x } k Loading matrix \Lambda
k \text{ x } k Latent covariance matrix \Psi
p \text{ x }p Residual covariance matrix \Theta

The likelihood is then given by:

\Sigma = \Lambda\Psi\Lambda^T + \Theta \\ y \sim MVN(\nu, \Sigma)

Now that the likelihood is not dependent on individual latent variable parameters, you can more easily hold-out observations to be predicted.

In concrete terms, the LOO-CV in this context is assessing the extent to which the latent variable model is generalisable to new observations - it provides some indication of the extent of model overfitting

dkaplan · March 6, 2022, 2:58pm

Hi Andrew,

Thanks. I understand the factor model, but I think my question is simpler. Is it the persons entire p-vector
of responses that is being held out at once?

Thanks

David

andrjohns · March 6, 2022, 3:12pm

It depends on what you want to assess. The consideration is the same as with any multivariate regression model. See this thread for more discussion: How to calculate log_lik in generated quantities of a multivariate regression model - #10 by andrjohns

Mauricio_Garnier-Villarre · March 6, 2022, 9:50pm

David

You can specify different thing that can be left out. But the most common use would leave out an entire row/person data. So leaving out all responsesn from each person.

In blavaan we use the LOOIC, which approximates this leave one subject out approach

take care

Chen_Chen · April 25, 2022, 12:27am

Hi Andrew, if my goal is to use the Bayesian Stacking to deal with the multimodality problem, which LOOCV do you think is more appropriate in this factor analysis? Say leave one element out or leave one column out? Many thanks!

andrjohns · April 25, 2022, 4:32am

The interaction between LOOCV and Bayesian Stacking isn’t an area that I’m familiar with, so I might defer this question to @avehtari

avehtari · April 25, 2022, 7:19am

The same consideration apply as what @andrjohns mentioned.

Chen_Chen · April 25, 2022, 8:44am

Thanks, @avehtari and @andrjohns. I think I understand this depends on if I want to predict one observation or all observations per individual, and interestingly, both predictions make sense to my data. Here are the two pipelines.
The first is to use Leave-one-element-out + PSIS_LOO + Bayesian stacking.
The second is to use the leave-one-column-out, but PSIS_LOO might not be a good approximation because more data points are left out, so it would be better to refit the model N times using n-fold-CV?

For the first one, I want to make sure this is correct for the multivariate case because I didn’t see multivariate examples in the Bayesian Stacking paper.
For the second one, I want to know if I don’t want to refit the model N times, how good is the PSIS_LOO approximation? I work on very large datasets and can’t afford to refit the model multiple times.

Thank you!

avehtari · April 25, 2022, 2:54pm

Yes,

If I understood your model correctly, it is correct. It’s also possible that the difference in stacking weights is negligible between leave-one-out and leave-one-column-out.

Good thing is that PSIS-LOO has built-in diagnostic, so you will know whether it’s useable or not (see, e.g., Sections 4 and 5 in Roaches cross-validation demo)

Topic		Replies	Views
LOO-CV for joint models Algorithms loo	8	88	May 12, 2025
WAIC and LOOCV for multivariate analysis with different distributions General loo	8	774	December 2, 2020
Various questions about interpretation of loo results General loo , interpret-results	2	1412	August 1, 2019
How to calculate log-likelihood for multivariate model with known covariance matrix for each id Modeling	2	62	October 10, 2024
Information criteria for multivariate models Modeling techniques , loo	7	894	April 23, 2018

Loocv in the factor analysis setting

Related topics