Cross Validation and Predictive Accuracy?

Hi all -

I have two multilevel nonlinear Beta regression models. Both of them have the same number of parameters (i.e., k, s_1, h, s_2):

Model \ 1: RSV = 1 / (1 + \textit{k} * Delay^{s_1}) * (1 + \textit{h} * OAs^{s_2})
Model \ 2: RSV = 1 / (1 + \textit{k} * Delay)^{s_1} * (1 + \textit{h} * OAs)^{s_2}

I’ve tested them with three different datasets, and Model 1 is consistently better than Model 2 based on cross-validation. For example,

       elpd_diff   se_diff
model2       0.0       0.0  
model1     -69.9      11.4  

Then I tried to know why Model 1 is better than Model 2 by examining the difference between the ELPD for each data point:

I failed to notice any sizable reason why Model 1 is better than Model 2. I then simulated each participant’s data from each of the two models and compared them with the group mean data. Surprisingly, when I plot them, the two models produce comparable, or almost the same, results (visually) in fitting the observed data. As may be seen below, the two simulated curves are almost identical.

I’ve been scratching my head over why this can be the case. Or is there a way to know/visualize why one model is better than the other?

Any thoughts will be much, much appreciated. Many thanks.

Mat

1 Like

You might get better answers if you’re able to share the model specifications and the plots that are confusing you.

If you like, you could store the log-likelihoods from cross validation and see if there are particular data points (or sets of points) that one is fitting better than the others.

1 Like

Thank you so much for the advice! I will add the info now with the ELPD for each data point. Thanks again.

It seems you have quite many observations, and then even small differences in pointwise elpds can cumulate. You could look at the difference in expectation of the mean prediction whether one of the models is consistently giving silightly bigger/smaller prediction means with certain delay or OA values.

Got it; thank you so much! Each of the three datasets has 50-100 participants, and each participant has 25 observations. And thanks for pointing in this direction; I do notice differences in mean prediction at certain independent variables, though the difference seems to be minor/non-sizable.

Thanks so much again!

1 Like