I’ve written a Stan model and would like to compute a K-fold cross validation on the model in order to assess the model’s predictive capabilities for my application.
I have a method to generate the training data and test data. The metrics I am interested in are all generated via the generated quantities block. My plan is to fit all 36 models and grab the mean of the metric I am interested in from each model fit.
The job is quite large, perhaps too large for my desktop, so I have been thinking about sending it to AWS. Before I do that (and before I spend my precious PhD stipend), I would like to know a good (or at least possible, if not best) way to parallelize the computations. The instances I am looking to use on AWS will have anywhere from 16 to 32 vCPUs.
I’m not sure if this is sufficient information for any of you to answer this question. I can post the model, or more context if that helps. Please let me know if you need more information.
Thanks for your time.