Kfold predictions on an external test set

fusaroli · October 1, 2022, 5:55pm

For a course I’m teaching I want to showcase the changes in predictive performance from training to (cross-validated) test set, to external test set (a new dataset with slightly different context of collection).

I was wondering whether there is an easy way to take a brms::kfold fitted model and make predictions on the external test set.
I can successfully fit the model. (m ← brm(…)
I can successfully kfold the model (kf_m ← kfold(m, K = 5, folds = “stratified”, group = “ID”))
I can successfully make predictions (kfp ← kfold_predict(kf_m))
But when I try to predict on a new dataset, the code fails (kfp_test ← kfold_predict(kf_m, newdata = TestData))

In particular, I get “Error in standata.brmsfit(.x1, resp = .x2, newdata = .x3, newdata = .x4, :
formal argument “newdata” matched by multiple actual arguments”.

I’m not sure whether I can access the 5 models within kfold and run the predictions manually, but in general it’d be neat to be able to run external predictions from kfold_predict()

avehtari · October 3, 2022, 4:52pm

But why do you want to make predictions to the test set with K-fold-CV posteriors? K-fold-CV itself estimates how good predictions the full data posterior makes, and what you ask is something different. In non-Bayesian context, it can make more sense as conditioning on different data sets can stabilise the inference, but in Bayesian inference that is handled by the integration over the posterior. I guess this is the reason no-one has thought that kfold_predict(kf_m, newdata = TestData)) would be useful to make work.

fusaroli · October 3, 2022, 9:02pm

thanks for the answer! Yes, I expect no gain there and I realize this is a niche use case, but it’s useful to me because:

I could do cv model selection, fit the full chosen model and predict a new dataset, but I’m trying to make the more general point that the more robust cv performance assessment (compared to the full model) is still contingent on the dataset being fully representative of the population at stake.
It makes it much easier to teach ML pipelines where we can replace the stan model with e.g. a random forest, without really changing the pipeline.

Topic		Replies	Views
Is the kfold method in brms/rstanarm similar to kfold validation in machine learning? General	6	861	October 14, 2019
Error:Please run kfold with 'save_fits = TRUE' brms	5	512	October 11, 2019
Is it possible to add a CV fold-dependent data-preprocessing step in `loo::kfold()` or the corresponding 'brms' method? Interfaces loo , r , cross-validation , brms	2	568	August 4, 2022
Kfold.brmsfit doesn't work when model includes t2 smoothing term Interfaces loo , brms	6	549	February 13, 2023
Specify grouping factor for brms kfold cross-validation brms	6	858	June 7, 2020

Kfold predictions on an external test set

Related topics