Cross Validation and Predictive Accuracy?

Haoran_Matt_Wan · May 11, 2024, 5:42pm

Hi all -

I have two multilevel nonlinear Beta regression models. Both of them have the same number of parameters (i.e., k, s_1, h, s_2):

Model \ 1: RSV = 1 / (1 + \textit{k} * Delay^{s_1}) * (1 + \textit{h} * OAs^{s_2})
Model \ 2: RSV = 1 / (1 + \textit{k} * Delay)^{s_1} * (1 + \textit{h} * OAs)^{s_2}

I’ve tested them with three different datasets, and Model 1 is consistently better than Model 2 based on cross-validation. For example,

       elpd_diff   se_diff
model2       0.0       0.0  
model1     -69.9      11.4

Then I tried to know why Model 1 is better than Model 2 by examining the difference between the ELPD for each data point:

I failed to notice any sizable reason why Model 1 is better than Model 2. I then simulated each participant’s data from each of the two models and compared them with the group mean data. Surprisingly, when I plot them, the two models produce comparable, or almost the same, results (visually) in fitting the observed data. As may be seen below, the two simulated curves are almost identical.

I’ve been scratching my head over why this can be the case. Or is there a way to know/visualize why one model is better than the other?

Any thoughts will be much, much appreciated. Many thanks.

Mat

jsocolar · May 11, 2024, 7:20pm

You might get better answers if you’re able to share the model specifications and the plots that are confusing you.

If you like, you could store the log-likelihoods from cross validation and see if there are particular data points (or sets of points) that one is fitting better than the others.

Haoran_Matt_Wan · May 11, 2024, 8:23pm

Thank you so much for the advice! I will add the info now with the ELPD for each data point. Thanks again.

avehtari · May 14, 2024, 8:40am

It seems you have quite many observations, and then even small differences in pointwise elpds can cumulate. You could look at the difference in expectation of the mean prediction whether one of the models is consistently giving silightly bigger/smaller prediction means with certain delay or OA values.

Haoran_Matt_Wan · May 14, 2024, 5:17pm

Got it; thank you so much! Each of the three datasets has 50-100 participants, and each participant has 25 observations. And thanks for pointing in this direction; I do notice differences in mean prediction at certain independent variables, though the difference seems to be minor/non-sizable.

Thanks so much again!

Topic		Replies	Views
Cross-validation FAQ Modeling loo , howto	22	3019	November 10, 2023
Estimation accuracy is high but predictive accuracy is low? Possibly overfitting? Modeling rstan , techniques , fitting-issues , specification , loo	2	314	August 29, 2023
LOO on large ODE fit: Am I interpreting it correctly? Modeling loo	1	390	July 25, 2023
Loo with custom utility functions Modeling loo	3	352	February 9, 2024
Quantifying Uncertainty with the LOO-CV criterion Modeling techniques , fitting-issues , specification , loo	10	136	March 31, 2025

Cross Validation and Predictive Accuracy?

Related topics