Is it possible to add a CV fold-dependent data-preprocessing step in loo::kfold()
or a corresponding ‘brms’ method?
I want to estimate the predictive performance of a model (computed with ‘brms’) using loo::kfold()
(or the method for brmsfit
objects) as leave-group-out cross-validation and — and this is the key — where the training data are preprocessed depending on the training data/cross-validaton fold.
Specifically, the Bayesian model is a simple linear regression model where predictor variables are score values of a principal component analysis (PCA) (or some other dimension reduction approach) (such an approach has e.g. been suggested in Piironen and Vehtari (2017)). For the cross-validation, this means that for each cross-validation fold, I ideally have to:
- fit the PCA on the respective training data,
- extract the scores, say for the first 40, principal components, and
- use these as values for the predictor variables for the respective CV fold.
This is what I mean with a “CV fold-dependent data-preprocessing step” or “training data-dependent preprocessing step”.
As far as I can see, brms::kfold.brmsfit()
does not support cross-validation where the training data are preprocessed dependeing on the CV fold. Is this correct?