Kfold returning NaN

Operating System: Windows 10
Interface Version: rstanarm 2.15.3

Hi everyone,

I am comparing two models with 10 fold cv with each specified as:

stan_lm(form_b, prior = R2(0.5), data = models_rf) 
stan_glm(form_b, family = gaussian, prior_intercept = normal(0, 2, autoscale = FALSE), data = models_rf )

I get sensible coefficients for both models and no errors from stan. Diagnostics all look great, however when I run kfold with K = 10 I get

[[1]]

 10-fold cross-validation

           Estimate SE
elpd_kfold      NaN NA

[[2]]

10-fold cross-validation

           Estimate  SE
elpd_kfold    -44.1 3.1

Any ideas what I’m missing? If K-fold = n then they ALL return NaN, and if I use loo I have the same issue regardless of k threshold.

Were there any additional messages when you ran kfold on the object produced by stan_lm?

Just re-ran to be sure and no I don’t get any errors thrown. Lots of Nas in the pointwise table though

   datapoint elpd_kfold
1          5 -0.8791526
2         18 -0.9307467
3         19 -0.8786047
4         22 -0.9934098
5          7 -0.9708819
6         12 -0.8187243
7         29 -0.9139367
8          6 -0.9963069
9         28 -0.9585191
10        31 -0.8197892
11        32 -0.9698006
12        11        NaN
13        14        NaN
14        33        NaN
15         9 -1.0214862
16        15 -0.9062823
17        16 -1.1734448
18        35 -0.9183763
19        10        NaN
20        21        NaN
21        36        NaN
22         8 -1.1338556
23        17 -0.8248504
24        20 -1.0941937
25        34 -1.7291152
26         1 -1.1508899
27         3 -0.8661240
28         4 -0.8970020
29        23 -1.7145888
30        25 -0.9875456
31        26 -0.8689930
32        27 -2.1405578
33         2        NaN
34        13        NaN
35        24        NaN
36        30        NaN

@timdisher Is your data confidential or would you be able to provide code +
data that reproduces this? That would make it much easier to figure out
what’s going on in this case.

Does dropping some of the datapoints result in perfect collinearity?

I was worried you might ask that. Unfortunately yes, this data is confidential and I’m not confident our REB would approve sending it along even though it is de-identified. I can tell you I have a 36 x 13 data frame, and that the formula used in the model is:

scale(pcl21t1) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + scale(owies) + scale(Mom_Age) + scale(SES) + scale(mi9t1) + scale(LOS)

Could this be an issue related to scaling variables (particularly the dependent?). Original author’s analysis was on the standardized beta scale so I am trying to give them results in the same units. I just don’t understand why the same model on the same outcome expressed using stan glmer doesn’t give errors.

Also, can you try kfold with the non-default argument save_fits = TRUE? Then, figure out what the problem is from the fits element of the resulting list.

When you say does dropping some of the datapoints result in perfect collinearity did you want me just to drop a random subset of observations, refit the model and return the cov mat?

Is there something in specific I’m looking for here? Sorry to be such an amateur.

stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.1    0.3  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.1    0.1  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.1    0.2  
scale(owies)    0.1    0.3  
scale(LOS)     -0.3    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.1    0.2  
scale(mi9t1)    0.4    0.2  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.1    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.3    0.3  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.3    0.2  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)       0.0    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.2  
RandomKC        0.0    0.3  
scale(KMC_std)  0.2    0.2  
scale(owies)    0.2    0.3  
scale(LOS)     -0.2    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.2    0.2  
scale(SNAP)     0.2    0.2  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.0    0.2  
scale(owies)    0.3    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age) -0.2    0.2  
scale(SES)     -0.2    0.2  
scale(mi9t1)    0.2    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.3  
scale(GA)       0.0    0.3  
scale(Sex)      0.2    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.0    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.2    0.2  
scale(LOS)     -0.2    0.3  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.9    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.2    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.2    0.1  
RandomKC        0.2    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.2    0.2  
RandomKC        0.0    0.3  
scale(KMC_std)  0.2    0.1  
scale(owies)    0.4    0.3  
scale(LOS)     -0.4    0.4  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.2    0.1  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.3    0.1  
scale(owies)    0.1    0.2  
scale(LOS)     -0.1    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.7    0.1  
log-fit_ratio   0.0    0.1  
R2              0.5    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.2    0.3  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.2  
RandomKC        0.3    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.3    0.2  
scale(Mom_Age) -0.2    0.1  
scale(SES)     -0.2    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.1    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.4    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.2    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.4    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age)  0.0    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2

Perhaps part of the problem is all the scale calls. When you drop some observations for kfold, the mean and standard deviation change and so scale produces a different number for each fold than when all of the data are included, which violates the assumptions of kfold. But that doesn’t explain the NaNs.

Works if I unscale everything.

The question is: Does it make sense to report my scaled models but do kfold on unscaled versions?

Models with a standardized outcome don’t make sense in a Bayesian context and standardized predictors tend to confuse rstanarm. The best thing is to estimate models with the raw variables and then

Thanks again. I guess I should have seen that coming ;)

I would not say “violates the assumptions of kfold”. In this case the kfold is taking into account of the scaling, too, which can be exactly what you want and is also commonly used approach. Most often the kfold results are not sensitive whether the scaling is made in or out of kfold (unless small n and maximum likelihood or really tight priors). You could say that scaling corresponds to using data dependent priors, which could be considered to violate principles of Bayesian inference, but then again if priors are weakly informative scaling often doesn’t affect the results (except might affect the computational performance and is very likely affecting the convenience of interpretation).