Calculating LOO-CV for a multinormal regression model

No. lpd is defined in equation (2) in [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

lpd <- matrixStats::colLogSumExps(ll) - log(nrow(ll))

where ll is log_lik from the full data posterior. Then you can compute pointwise p_loo by subtracting elpd_kcv from lpd. If you then plot observation index vs pointwise p_loo, we’ll learn more. I’m sorry my answers are brief as I’m heavily overloaded, but I still want to help.

1 Like

Hi,

So I hope I got it right: I ran the equation on both the log-lik from the full data posterior (this is lpd), as well as on the log-lik from the heldout data posterior (this is elpd_kcv). Substracting these two yields an n-long, 2000 pt. vector, which is the pointwise p_loo - correct? As you suggested, I plotted it against the observations indices (I’m attaching two types of visualizations. It’s still very dense, but hopefully it provides you with sufficient information; if not’ please let me know what would be a more suitable visualization:

Thanks again for all you tremendous help,

Ofir

Yes. And the plot looks sensible. For models with conditionally univariate observation model pointwise p_loo is usually < 1 unless there is really bad model misspecification. With multinormal observation model this can be more, but I haven’t had experience with than and haven’t had time to go through the math to figure out the asymptotic behavior in weak prior well specified model case.

So it seems that when you remove one observation the posterior is changing too much for PSIS to work. But since you got KCV to work you are fine. The model is very flexible. You can use KCV also with other utility or loss function. What do you think would be useful measure for accuracy in your case?

What do you think would be useful measure for accuracy in your case?

I thought that this is somehow indicated by the se_elpd, and that the next and final step would be to compare this between models (at least that was the procedure described in the tutorial I followed).
Do I actually need to generate data within each fold by using the holdout data and the estimated model parameters, via the multi_normal_cholesky_rng function? and then maybe compare the standard deviation of this simulated data vs. that of the true data? And if this is the case, should I do it for each model before or after model comparison (is it possible for one model to outperform another based on se_elpd comparison, yet to be less accurate?)? Is there as simpler solution or common practice i’m missing here?

Thanks,

Ofir

and then maybe compare the standard deviation of this simulated data vs. that of the true data?

  • this is a mistake, I meant here something like averaging the squared error between the predicted heldout data and the true heldout data.

se_elpd estimates the accuracy of elpd_loo, but what I meant by my question is that what do you think is useful measure for accuracy of your model? That, is if you make a prediction with your model how would you like to compare that to the actual observation? What kind of comparison would be useful, meaningful and interpretable taking into account your modeling problem?

Maybe you have type there as you would not compare SE’s which measure the accuracy of the estimate.

It depends what utility or loss function you want to use.

That goes in to the direction of posterior predictive checking which is useful, too. See e.g. Visualization in Bayesian workflow

Again that se_elpd in wrong context?

Sure, if you think that is easy for you to interpret and it’s meaningful in your modeling problem.

Hi,
I indeed had a typo their with the se_elpd.
I finished comparing the 3 models. I did this by creating a list for each model, which contains the pointwise elpd estimate and se. I constructed the structures so I could run loo_comapre on these lists. Running loo_compare(model_1, model_2, model_3) worked fine - is this the right way to compare? Because if I added (criterion=“kfold”) I got an error saying “All inputs should have class ‘loo’”; I don’t understand this, since I did define class ‘loo’ and ‘kfold’.
So these are my results:

image

Are these differences considered “significant”? I thought that it’s a bit strange that model 2 has a much smaller se. Anyway, i’m now running it again with different holdout indices, just to be sure.

Many thanks again,

Ofir

What did you mean to write?

How did you estimate these? The thread is long and you have reported different results, so I’m uncertain what computations you have used for these. You can post the code lines you are using so it’s easier to check what you’ve done.

And what are these model_x objects?

Added where?

Are these based on K-fold-CV? And is the table from loo_compare?

It’s not much smaller and different models may have different variation in predictive accuracy affecting the accuracy of expectation.

Hi,

What did you mean to write?

I meant to write elpd estimate.

How did you estimate these? The thread is long and you have reported different results, so I’m uncertain what computations you have used for these. You can post the code lines you are using so it’s easier to check what you’ve done.

I ran k-fold with 10 folds for each of my three models in rstan. Then I extracted log-lik and computed pointwise elpd for the k-fold, based on the functions in this link - K-fold cross-validation in Stan | DataScience+. So this was the function for extracting elpd estimate and its se:

kfold ← function(log_lik_heldout) {
library(matrixStats)
logColMeansExp ← function(x) {
# should be more stable than log(colMeans(exp(x)))
S ← nrow(x)
colLogSumExps(x) - log(S)
}

See equation (20) of @VehtariEtAl2016

pointwise ← matrix(logColMeansExp(log_lik_heldout), ncol= 1)
colnames(pointwise) ← “elpd”

See equation (21) of @VehtariEtAl2016

elpd_kfold ← sum(pointwise)
se_elpd_kfold ← sqrt(ncol(log_lik_heldout) * var(pointwise))
out ← list(
pointwise = pointwise,
elpd_kfold = elpd_kfold,
se_elpd_kfold = se_elpd_kfold)
#structure(out, class = “loo”)
return(out)
}

And what are these model_x objects?

Based on what you wrote in this thread - Compare models with k-fold cv - #2 by avehtari, for each model I created a structure consisting of the pointwise elpd estimate and its se derived from the kfold. The structure was defined according to the guidelines in the loo package manual regarding kfold (pp. 12, under “kfold generic”, so it will be compatible with other loo functions - hope I got it right.

Added where?

In the loo_comapre function, I thought I’m supposed to define this if i’m performing it on k-fold.

Are these based on K-fold-CV? And is the table from loo_compare?

Yes, based on th kcv, and this is the table yielded by the loo-compare.

Thanks,

Ofir

loo_compare doesn’t have that kind of argument Model comparison — loo_compare • loo, so I’m confused.

Good. The values seem reasonable given the information you have provided.

loo_compare doesn’t have that kind of argument https://mc-stan.org/loo/reference/loo_compare.html , so I’m confused.

I saw it here - Information criteria and cross-validation — loo.stanreg • rstanarm - maybe it’s only relevant for fits derived from rstanarm?

Good. The values seem reasonable given the information you have provided.

Great! and thank you so much for all your help along this process.

Best regards,

Ofir

1 Like

It’s relevant when the used for stanreg objects from rstanarm for which we can make automated kfold, but we can’t make automated kfold for stanfit objects. I’ll make a not to clarify that function reference.