Calculating LOO-CV for a multinormal regression model

Hello again,

I have managed to run the mm_loo properly, but as you suspected it didn’t improve the situation much. I had NaNs because of a couple of divergent transitions which probably coincided with the high k-pareto, so I ran the model again and without transitions the mm_loo worked fine – so just wanted to mention this.

Now I wish to run k-fold as you suggested and I‘m searching for some vignettes or examples for how to implement it in rstan (rather than rstanarm). There is the example in your loo paper, but it’s a bit brief so I couldn’t get a clear picture of the whole pipeline, and I was mostly unsure about how I can devise something that will suit my hierarchical model which contains subject-level parameters (I want to keep these covariance matrices at the subject-level, as I’m mostly interested in individual differences regarding their parameters). Suppose that a “_t” and “_h” suffixes denote the training holdout data, as in the example in the paper; is it correct to produce the log-lik as follows (this is a simplification of the model I previously sent):


data{

int <lower=1> Ntotal_t ; // number of trials - training

int <lower=1> s_t[Ntotal_t] ; // index of subject at a given question – training

int <lower=1> Nsubj_t ; // number of participants – training

int y1_t[Ntotal_t] ; // predictor 1 – training

int y2_t[Ntotal_t] ; // predictor 2 – training

int y3_t[Ntotal_t] ; // predictor 2 – training

--- same for holding data ----

}

...

generated quantities {

vector[Ntotal_t] log_lik_t;

vector[Ntotal_h] log_lik_h;

for (n in 1:Ntotal_t) {

log_lik_t[n] = multi_normal_cholesky_lpdf([y1_t[n], y2_t [n], y3_t [n]] | [mu1[s_t[n]], mu2[s_t [n]], mu3[s_t [n]], quad [s_t [n],,]);

}

for (n in 1:Ntotal_h) {

log_lik_h[n] = multi_normal_cholesky_lpdf([y1_h[n], y2_h [n], y3_h [n]] | [mu1[s_h[n]], mu2[s_h [n]], mu3[s_h [n]], quad [s_h [n],,]);

}

I also found this example - https://datascienceplus.com/k-fold-cross-validation-in-stan/

which seems to offer a somewhat different approach, but I saw you had some reservations about it here – Compare models with k-fold cv. I was wondering if some parts of it are valid, in particular the manner in which the log_lik is produced for the different folds.

Is there any existing vignette on this issue, covering the whole pipeline from fold creation, running the models n_fold times and up to model comparison?

In addition, regarding the number of folds - I have 50 subjects but my model runs for a pretty long time (~2.5-3 hr). What would you recommend as a reasonable amount of folds?

Many thanks again,

Ofir