Kfold returning NaN

timdisher · September 19, 2017, 1:14pm

Operating System: Windows 10
Interface Version: rstanarm 2.15.3

Hi everyone,

I am comparing two models with 10 fold cv with each specified as:

stan_lm(form_b, prior = R2(0.5), data = models_rf) 
stan_glm(form_b, family = gaussian, prior_intercept = normal(0, 2, autoscale = FALSE), data = models_rf )

I get sensible coefficients for both models and no errors from stan. Diagnostics all look great, however when I run kfold with K = 10 I get

[[1]]

 10-fold cross-validation

           Estimate SE
elpd_kfold      NaN NA

[[2]]

10-fold cross-validation

           Estimate  SE
elpd_kfold    -44.1 3.1

Any ideas what I’m missing? If K-fold = n then they ALL return NaN, and if I use loo I have the same issue regardless of k threshold.

bgoodri · September 19, 2017, 1:29pm

Were there any additional messages when you ran kfold on the object produced by stan_lm?

timdisher · September 19, 2017, 2:36pm

Just re-ran to be sure and no I don’t get any errors thrown. Lots of Nas in the pointwise table though

   datapoint elpd_kfold
1          5 -0.8791526
2         18 -0.9307467
3         19 -0.8786047
4         22 -0.9934098
5          7 -0.9708819
6         12 -0.8187243
7         29 -0.9139367
8          6 -0.9963069
9         28 -0.9585191
10        31 -0.8197892
11        32 -0.9698006
12        11        NaN
13        14        NaN
14        33        NaN
15         9 -1.0214862
16        15 -0.9062823
17        16 -1.1734448
18        35 -0.9183763
19        10        NaN
20        21        NaN
21        36        NaN
22         8 -1.1338556
23        17 -0.8248504
24        20 -1.0941937
25        34 -1.7291152
26         1 -1.1508899
27         3 -0.8661240
28         4 -0.8970020
29        23 -1.7145888
30        25 -0.9875456
31        26 -0.8689930
32        27 -2.1405578
33         2        NaN
34        13        NaN
35        24        NaN
36        30        NaN

jonah · September 19, 2017, 2:43pm

@timdisher Is your data confidential or would you be able to provide code +
data that reproduces this? That would make it much easier to figure out
what’s going on in this case.

bgoodri · September 19, 2017, 2:53pm

Does dropping some of the datapoints result in perfect collinearity?

timdisher · September 19, 2017, 2:57pm

I was worried you might ask that. Unfortunately yes, this data is confidential and I’m not confident our REB would approve sending it along even though it is de-identified. I can tell you I have a 36 x 13 data frame, and that the formula used in the model is:

scale(pcl21t1) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + scale(owies) + scale(Mom_Age) + scale(SES) + scale(mi9t1) + scale(LOS)

Could this be an issue related to scaling variables (particularly the dependent?). Original author’s analysis was on the standardized beta scale so I am trying to give them results in the same units. I just don’t understand why the same model on the same outcome expressed using stan glmer doesn’t give errors.

bgoodri · September 19, 2017, 3:09pm

Also, can you try kfold with the non-default argument save_fits = TRUE? Then, figure out what the problem is from the fits element of the resulting list.

timdisher · September 19, 2017, 3:11pm

When you say does dropping some of the datapoints result in perfect collinearity did you want me just to drop a random subset of observations, refit the model and return the cov mat?

timdisher · September 19, 2017, 3:23pm

Is there something in specific I’m looking for here? Sorry to be such an amateur.

stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.1    0.3  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.1    0.1  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.1    0.2  
scale(owies)    0.1    0.3  
scale(LOS)     -0.3    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.1    0.2  
scale(mi9t1)    0.4    0.2  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.1    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.3    0.3  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.3    0.2  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)       0.0    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.2  
RandomKC        0.0    0.3  
scale(KMC_std)  0.2    0.2  
scale(owies)    0.2    0.3  
scale(LOS)     -0.2    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.2    0.2  
scale(SNAP)     0.2    0.2  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.0    0.2  
scale(owies)    0.3    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age) -0.2    0.2  
scale(SES)     -0.2    0.2  
scale(mi9t1)    0.2    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.3  
scale(GA)       0.0    0.3  
scale(Sex)      0.2    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.0    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.2    0.2  
scale(LOS)     -0.2    0.3  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.9    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.2    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.2    0.1  
RandomKC        0.2    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.2    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.2    0.2  
RandomKC        0.0    0.3  
scale(KMC_std)  0.2    0.1  
scale(owies)    0.4    0.3  
scale(LOS)     -0.4    0.4  
scale(Mom_Age) -0.1    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.3    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)     0.0    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.1  
scale(SNAP)     0.2    0.1  
RandomKC       -0.1    0.3  
scale(KMC_std)  0.3    0.1  
scale(owies)    0.1    0.2  
scale(LOS)     -0.1    0.3  
scale(Mom_Age) -0.1    0.1  
scale(SES)     -0.3    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.7    0.1  
log-fit_ratio   0.0    0.1  
R2              0.5    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.2    0.3  
scale(GA)      -0.1    0.2  
scale(Sex)      0.3    0.2  
scale(SNAP)     0.1    0.2  
RandomKC        0.3    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.3    0.2  
scale(LOS)     -0.3    0.2  
scale(Mom_Age) -0.2    0.1  
scale(SES)     -0.2    0.1  
scale(mi9t1)    0.4    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
 family:  gaussian [identity]
 formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + 
	   scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) + 
	   scale(mi9t1)
------

Estimates:
               Median MAD_SD
(Intercept)    -0.1    0.2  
scale(GA)      -0.1    0.2  
scale(Sex)      0.4    0.2  
scale(SNAP)     0.1    0.1  
RandomKC        0.2    0.3  
scale(KMC_std)  0.1    0.1  
scale(owies)    0.4    0.2  
scale(LOS)     -0.4    0.3  
scale(Mom_Age)  0.0    0.2  
scale(SES)     -0.3    0.2  
scale(mi9t1)    0.3    0.1  
sigma           0.8    0.1  
log-fit_ratio   0.0    0.1  
R2              0.4    0.1  

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.0    0.2

bgoodri · September 19, 2017, 3:30pm

Perhaps part of the problem is all the scale calls. When you drop some observations for kfold, the mean and standard deviation change and so scale produces a different number for each fold than when all of the data are included, which violates the assumptions of kfold. But that doesn’t explain the NaNs.

timdisher · September 19, 2017, 3:36pm

Works if I unscale everything.

The question is: Does it make sense to report my scaled models but do kfold on unscaled versions?

bgoodri · September 19, 2017, 3:39pm

Models with a standardized outcome don’t make sense in a Bayesian context and standardized predictors tend to confuse rstanarm. The best thing is to estimate models with the raw variables and then

timdisher · September 19, 2017, 3:44pm

Thanks again. I guess I should have seen that coming ;)

avehtari · September 19, 2017, 4:07pm

I would not say “violates the assumptions of kfold”. In this case the kfold is taking into account of the scaling, too, which can be exactly what you want and is also commonly used approach. Most often the kfold results are not sensitive whether the scaling is made in or out of kfold (unless small n and maximum likelihood or really tight priors). You could say that scaling corresponds to using data dependent priors, which could be considered to violate principles of Bayesian inference, but then again if priors are weakly informative scaling often doesn’t affect the results (except might affect the computational performance and is very likely affecting the convenience of interpretation).

Topic		Replies	Views
Kfold invalid stanfit bug Modeling loo , rstanarm	1	516	June 11, 2020
K-fold cv unreliable results General loo	11	1436	August 25, 2017
Loo with k_threshold parameter vs. kfold for comparing rstanarm models rstanarm loo	5	1141	December 21, 2018
Cross-validation for hierarchical models (Replicating Aki's post) rstanarm loo	3	871	March 19, 2019
Error using kfold on stan_gamm4 object Modeling loo	6	555	May 13, 2020

Kfold returning NaN

Related topics