Efficient K-fold CV on a Stan Model

PhDemetri · February 20, 2019, 3:39am

I’ve written a Stan model and would like to compute a K-fold cross validation on the model in order to assess the model’s predictive capabilities for my application.

I have a method to generate the training data and test data. The metrics I am interested in are all generated via the generated quantities block. My plan is to fit all 36 models and grab the mean of the metric I am interested in from each model fit.

The job is quite large, perhaps too large for my desktop, so I have been thinking about sending it to AWS. Before I do that (and before I spend my precious PhD stipend), I would like to know a good (or at least possible, if not best) way to parallelize the computations. The instances I am looking to use on AWS will have anywhere from 16 to 32 vCPUs.

I’m not sure if this is sufficient information for any of you to answer this question. I can post the model, or more context if that helps. Please let me know if you need more information.

Thanks for your time.

bgoodri · February 20, 2019, 4:58am

Before you spend money on that much AWS, is PSISLOOCV adequate?

PhDemetri · February 20, 2019, 1:36pm

It is my understanding that PSISLOOCV and the loo package aren’t appropriate.

My data concern concentrations of drug in a patient’s blood over time. The CV is designed to leave one patient out in each fold, which means leaving several observations out at a time as opposed to 1 observation.

bgoodri · February 20, 2019, 5:27pm

Yeah, if you have patient-specific parameters and imagine leaving out one patient, then what the loo package currently does is not applicable. But there was a StanCon presentation about how to leave-one-group-out cross-validation without re-restimating models (although it requires integrating the patient-specific parameters out of the likelihood)

Topic		Replies	Views
Model Comparison when high Pareto k? General rstan , loo	6	182	July 21, 2025
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	93	March 27, 2025
K-fold cross validation for large data models - stan's optimiser? Modeling	6	1539	August 29, 2017
Usage of loo package with a multiple outcome model General rstan , fitting-issues , loo	1	603	November 4, 2020
K-fold validation for hierarchical model in rstan Modeling rstan , loo	4	1034	March 29, 2023

Efficient K-fold CV on a Stan Model

Related topics