Efficient K-fold CV on a Stan Model

I’ve written a Stan model and would like to compute a K-fold cross validation on the model in order to assess the model’s predictive capabilities for my application.

I have a method to generate the training data and test data. The metrics I am interested in are all generated via the generated quantities block. My plan is to fit all 36 models and grab the mean of the metric I am interested in from each model fit.

The job is quite large, perhaps too large for my desktop, so I have been thinking about sending it to AWS. Before I do that (and before I spend my precious PhD stipend), I would like to know a good (or at least possible, if not best) way to parallelize the computations. The instances I am looking to use on AWS will have anywhere from 16 to 32 vCPUs.

I’m not sure if this is sufficient information for any of you to answer this question. I can post the model, or more context if that helps. Please let me know if you need more information.

Thanks for your time.

Before you spend money on that much AWS, is PSISLOOCV adequate?

It is my understanding that PSISLOOCV and the loo package aren’t appropriate.

My data concern concentrations of drug in a patient’s blood over time. The CV is designed to leave one patient out in each fold, which means leaving several observations out at a time as opposed to 1 observation.

Yeah, if you have patient-specific parameters and imagine leaving out one patient, then what the loo package currently does is not applicable. But there was a StanCon presentation about how to leave-one-group-out cross-validation without re-restimating models (although it requires integrating the patient-specific parameters out of the likelihood)