Hi there! I would like to implement model stacking using exact leave-one-out cross-validation for a problem at hand. However, I have a pharmaceutical/medical background and I am lacking a bit of the statistical notation and terminology. So far I’ve read Vehtari et al., 2016 and Yao et al., 2018 as well as watched some of Aki Vehtaris (@avehtari) videos (Aki Vehtari - Videos) about this topic. They are really helpful, however, I would like to double-check if my proposed workflow seems to be correct. For the sake of my understanding I plan to implement it from scratch. This would be my proposed workflow, which is based on the two papers mentioned above:

We consider K linear candidate models M_k which we want to combine via bayesian model stacking for a given dataset with N observations. For the sake of simplicity we assume to only have one model parameter \theta for the slope.

- For each M_k we write a representation in Stan.
- We split our data in k-folds for cross-validation (CV). If k << N we can call it
*exact k-fold CV*and if k = N we can call it*exact leave-one-out (LOO) CV*. Let’s assume we go for*exact LOO CV*. - For each fold we have N-1 datapoints y_{-i} on which each model M_k is being trained, resulting in S posterior draws per k^{th} model (\theta_{k, s}).
- Additionally we have one held-back datapoint y_i per fold. For each \theta_{k,s} we can evaluate the model M_k and calculate the likelihood for the held-back y_i, leading to p(y_i|\theta_{k,s}). This likelihood would be calculated based on \theta_{k,s} and the specified error model.
- Per model M_k we can then calculate the overall likelihood for y_i across the S posterior draws:

- When choosing the logarithmic scoring rule, we can define the optimisation problem to find the optimal set of weights w while constraining S^K_1 = \{w \in [0, 1]^K : \sum^K_{k=1} w_k = 1\} by:

- We can use these optimised w to linearily weight the number of posterior predictive draws per model M_k to get the final predictive distribution for a potential future datapoint.

My questions:

- Is this workflow in general correct?
- I am a bit confused that we often see in code snippets a
*log_lik[i]*definition in the*generated quantities { }*block of Stan. To my understanding the*generated quantities { }*block is being evaluated after each draw and for the maximisation problem (step 6) we need to apply the weight to the likelihood in the normal domain before we are summing up the likelihoods of K models and only then taking the log. If I generate the*log_lik[i]*per draw I would need to bring it back to the normal domain, potentially with the log-sum-exp trick to avoid underflow. Also the loo package seems to take a S \times n log-likelihood matrix as an argument to do its calculations. Is this correct or do I have a mistake in my thinking? Not unlikely that I am mixing up something. - If we use the
*generated quantities { }*block to get the (log-)likelihood, we get this for each draw. But in the end we are rather interested in the LOO predictive density for y_i, which would involve both the variability coming from the S draws and the error model. Can we just calculate this predictive density by taking the mean of the single likelihoods from the S draws as mentioned in step 5? I guess there I am lacking the statistical background.

Thanks a lot in advance!