Operating System: Windows 10
Interface Version: rstanarm 2.15.3
Hi everyone,
I am comparing two models with 10 fold cv with each specified as:
stan_lm(form_b, prior = R2(0.5), data = models_rf)
stan_glm(form_b, family = gaussian, prior_intercept = normal(0, 2, autoscale = FALSE), data = models_rf )
I get sensible coefficients for both models and no errors from stan. Diagnostics all look great, however when I run kfold with K = 10 I get
[[1]]
10-fold cross-validation
Estimate SE
elpd_kfold NaN NA
[[2]]
10-fold cross-validation
Estimate SE
elpd_kfold -44.1 3.1
Any ideas what I’m missing? If K-fold = n then they ALL return NaN, and if I use loo I have the same issue regardless of k threshold.
bgoodri
September 19, 2017, 1:29pm
2
Were there any additional messages when you ran kfold
on the object produced by stan_lm
?
Just re-ran to be sure and no I don’t get any errors thrown. Lots of Nas in the pointwise table though
datapoint elpd_kfold
1 5 -0.8791526
2 18 -0.9307467
3 19 -0.8786047
4 22 -0.9934098
5 7 -0.9708819
6 12 -0.8187243
7 29 -0.9139367
8 6 -0.9963069
9 28 -0.9585191
10 31 -0.8197892
11 32 -0.9698006
12 11 NaN
13 14 NaN
14 33 NaN
15 9 -1.0214862
16 15 -0.9062823
17 16 -1.1734448
18 35 -0.9183763
19 10 NaN
20 21 NaN
21 36 NaN
22 8 -1.1338556
23 17 -0.8248504
24 20 -1.0941937
25 34 -1.7291152
26 1 -1.1508899
27 3 -0.8661240
28 4 -0.8970020
29 23 -1.7145888
30 25 -0.9875456
31 26 -0.8689930
32 27 -2.1405578
33 2 NaN
34 13 NaN
35 24 NaN
36 30 NaN
jonah
September 19, 2017, 2:43pm
4
@timdisher Is your data confidential or would you be able to provide code +
data that reproduces this? That would make it much easier to figure out
what’s going on in this case.
bgoodri
September 19, 2017, 2:53pm
5
Does dropping some of the datapoints result in perfect collinearity?
I was worried you might ask that. Unfortunately yes, this data is confidential and I’m not confident our REB would approve sending it along even though it is de-identified. I can tell you I have a 36 x 13 data frame, and that the formula used in the model is:
scale(pcl21t1) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) + scale(owies) + scale(Mom_Age) + scale(SES) + scale(mi9t1) + scale(LOS)
Could this be an issue related to scaling variables (particularly the dependent?). Original author’s analysis was on the standardized beta scale so I am trying to give them results in the same units. I just don’t understand why the same model on the same outcome expressed using stan glmer doesn’t give errors.
bgoodri
September 19, 2017, 3:09pm
7
Also, can you try kfold
with the non-default argument save_fits = TRUE
? Then, figure out what the problem is from the fits
element of the resulting list.
When you say does dropping some of the datapoints result in perfect collinearity did you want me just to drop a random subset of observations, refit the model and return the cov mat?
Is there something in specific I’m looking for here? Sorry to be such an amateur.
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.1 0.3
scale(GA) -0.2 0.2
scale(Sex) 0.3 0.1
scale(SNAP) 0.1 0.1
RandomKC -0.1 0.3
scale(KMC_std) 0.1 0.2
scale(owies) 0.1 0.3
scale(LOS) -0.3 0.3
scale(Mom_Age) -0.1 0.1
scale(SES) -0.1 0.2
scale(mi9t1) 0.4 0.2
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.2
scale(GA) -0.1 0.2
scale(Sex) 0.3 0.2
scale(SNAP) 0.1 0.1
RandomKC 0.1 0.3
scale(KMC_std) 0.1 0.1
scale(owies) 0.3 0.2
scale(LOS) -0.3 0.3
scale(Mom_Age) -0.1 0.2
scale(SES) -0.3 0.1
scale(mi9t1) 0.3 0.2
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.2
scale(GA) 0.0 0.2
scale(Sex) 0.3 0.2
scale(SNAP) 0.1 0.2
RandomKC 0.0 0.3
scale(KMC_std) 0.2 0.2
scale(owies) 0.2 0.3
scale(LOS) -0.2 0.3
scale(Mom_Age) -0.1 0.1
scale(SES) -0.3 0.1
scale(mi9t1) 0.3 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.2
scale(GA) -0.2 0.2
scale(Sex) 0.2 0.2
scale(SNAP) 0.2 0.2
RandomKC -0.1 0.3
scale(KMC_std) 0.0 0.2
scale(owies) 0.3 0.2
scale(LOS) -0.4 0.3
scale(Mom_Age) -0.2 0.2
scale(SES) -0.2 0.2
scale(mi9t1) 0.2 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.3
scale(GA) 0.0 0.3
scale(Sex) 0.2 0.2
scale(SNAP) 0.1 0.1
RandomKC 0.0 0.3
scale(KMC_std) 0.1 0.1
scale(owies) 0.2 0.2
scale(LOS) -0.2 0.3
scale(Mom_Age) -0.1 0.2
scale(SES) -0.3 0.2
scale(mi9t1) 0.3 0.1
sigma 0.9 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) -0.2 0.2
scale(GA) -0.2 0.2
scale(Sex) 0.3 0.1
scale(SNAP) 0.2 0.1
RandomKC 0.2 0.3
scale(KMC_std) 0.1 0.1
scale(owies) 0.3 0.2
scale(LOS) -0.4 0.3
scale(Mom_Age) -0.1 0.1
scale(SES) -0.3 0.1
scale(mi9t1) 0.4 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.4 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.2
scale(GA) -0.2 0.2
scale(Sex) 0.3 0.2
scale(SNAP) 0.2 0.2
RandomKC 0.0 0.3
scale(KMC_std) 0.2 0.1
scale(owies) 0.4 0.3
scale(LOS) -0.4 0.4
scale(Mom_Age) -0.1 0.2
scale(SES) -0.3 0.2
scale(mi9t1) 0.3 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.3 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) 0.0 0.2
scale(GA) -0.1 0.2
scale(Sex) 0.3 0.1
scale(SNAP) 0.2 0.1
RandomKC -0.1 0.3
scale(KMC_std) 0.3 0.1
scale(owies) 0.1 0.2
scale(LOS) -0.1 0.3
scale(Mom_Age) -0.1 0.1
scale(SES) -0.3 0.1
scale(mi9t1) 0.4 0.1
sigma 0.7 0.1
log-fit_ratio 0.0 0.1
R2 0.5 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) -0.2 0.3
scale(GA) -0.1 0.2
scale(Sex) 0.3 0.2
scale(SNAP) 0.1 0.2
RandomKC 0.3 0.3
scale(KMC_std) 0.1 0.1
scale(owies) 0.3 0.2
scale(LOS) -0.3 0.2
scale(Mom_Age) -0.2 0.1
scale(SES) -0.2 0.1
scale(mi9t1) 0.4 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.4 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
------
For info on the priors used see help('prior_summary.stanreg').
[[1]]
stan_lm
family: gaussian [identity]
formula: scale(C) ~ scale(GA) + scale(Sex) + scale(SNAP) + Random + scale(KMC_std) +
scale(owies) + scale(LOS) + scale(Mom_Age) + scale(SES) +
scale(mi9t1)
------
Estimates:
Median MAD_SD
(Intercept) -0.1 0.2
scale(GA) -0.1 0.2
scale(Sex) 0.4 0.2
scale(SNAP) 0.1 0.1
RandomKC 0.2 0.3
scale(KMC_std) 0.1 0.1
scale(owies) 0.4 0.2
scale(LOS) -0.4 0.3
scale(Mom_Age) 0.0 0.2
scale(SES) -0.3 0.2
scale(mi9t1) 0.3 0.1
sigma 0.8 0.1
log-fit_ratio 0.0 0.1
R2 0.4 0.1
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.0 0.2
bgoodri
September 19, 2017, 3:30pm
10
Perhaps part of the problem is all the scale
calls. When you drop some observations for kfold
, the mean and standard deviation change and so scale
produces a different number for each fold than when all of the data are included, which violates the assumptions of kfold
. But that doesn’t explain the NaNs
.
Works if I unscale everything.
The question is: Does it make sense to report my scaled models but do kfold on unscaled versions?
bgoodri
September 19, 2017, 3:39pm
12
Models with a standardized outcome don’t make sense in a Bayesian context and standardized predictors tend to confuse rstanarm . The best thing is to estimate models with the raw variables and then
The lmer results are suspect because the correlations are 1 or NaN.
The stan_lmer ones might be okay but you get a very small intercept (I think) due to standardizing your variables, which isn’t recommended. Just do it with unstandardized variables, and if you have to interpret things in terms of standard deviations. Do posterior_predict once with the actual data and once with a one standard deviation increase for all observations on the variable of interest. The distribution of the difference…
Thanks again. I guess I should have seen that coming ;)
bgoodri:
When you drop some observations for kfold, the mean and standard deviation change and so scale produces a different number for each fold than when all of the data are included, which violates the assumptions of kfold.
I would not say “violates the assumptions of kfold”. In this case the kfold is taking into account of the scaling, too, which can be exactly what you want and is also commonly used approach. Most often the kfold results are not sensitive whether the scaling is made in or out of kfold (unless small n and maximum likelihood or really tight priors). You could say that scaling corresponds to using data dependent priors, which could be considered to violate principles of Bayesian inference, but then again if priors are weakly informative scaling often doesn’t affect the results (except might affect the computational performance and is very likely affecting the convenience of interpretation).