Hello,
I am having trouble deciding on which model to choose, here is what I did:
library("rstanarm")
library("loo")
library("bayesplot")
fit1 <-
stan_glm(
formula = PHENO~C1+C2+C3+C4+C5+C6+C7+C8+C9+C10+SEX+AGE+REN_INSF+mean_HBA1C+DDuration,
data = mf,
gaussian(link = "identity"),
prior = normal(0, 2.5, autoscale = TRUE),
prior_intercept = normal(0, 5, autoscale = TRUE),
seed = 12345
)
loo1 <- loo(fit1, save_psis = TRUE)
print(loo1)
pdf("PSIS.pdf")
plot(loo1)
dev.off()
fit2 <-
stan_glm(
formula = PHENO~C10+REN_INSF+mean_HBA1C+DDuration,
data = mf,
gaussian(link = "identity"),
prior = normal(0, 2.5, autoscale = TRUE),
prior_intercept = normal(0, 5, autoscale = TRUE),
seed = 12345
)
loo2 <- loo(fit2, save_psis = TRUE)
pdf("PSIS2.pdf")
plot(loo2,label_points = TRUE)
dev.off()
yrep <- posterior_predict(fit1)
pdf("LOO.pdf")
ppc_loo_pit_overlay(
y = mf$PHENO,
yrep = yrep,
lw = weights(loo1$psis_object)
)
dev.off()
yrep <- posterior_predict(fit2)
pdf("LOO2.pdf")
ppc_loo_pit_overlay(
y = mf$PHENO,
yrep = yrep,
lw = weights(loo1$psis_object)
)
dev.off()
loo_compare(loo1, loo2)
elpd_diff se_diff
fit2 0.0 0.0
fit1 -7.3 2.5
> print(loo1)
Computed from 4000 by 1253 log-likelihood matrix
Estimate SE
elpd_loo -2246.3 89.4
p_loo 38.8 5.8
looic 4492.6 178.9
------
Monte Carlo SE of elpd_loo is 0.1.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 1251 99.8% 229
(0.5, 0.7] (ok) 2 0.2% 307
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
> print(loo2)
Computed from 4000 by 1253 log-likelihood matrix
Estimate SE
elpd_loo -2239.1 90.0
p_loo 29.7 4.5
looic 4478.1 180.1
------
Monte Carlo SE of elpd_loo is 0.1.
All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
Relevant plots are in attach. It seems that judging by the Pareto k estimates model loo2 is better.
But when I look at the plots LOO1.pdf and LOO2.pdf none of them looks like it is fitting…
Can you please tell me what to do next?
Variables from the 2nd model (PHENO~C10+REN_INSF+mean_HBA1C+DDuration) come from the selection done with:
library(olsrr)
model <- lm(PHENO~C1+C2+C3+C4+C5+C6+C7+C8+C9+C10+SEX+AGE+REN_INSF+mean_HBA1C+DDuration, data = mf)
k <- ols_step_best_subset(model)[LOO2.pdf|attachment](upload://cwKk0zauUFfWvkTXzIUOxduASx6.pdf) (473.2 KB) [LOO.pdf|attachment](upload://8GnS4dCj1FBOiVzeASHWZPiXlgX.pdf) (473.2 KB) [PSIS.pdf|attachment](upload://935kcdKXxZwv8bFX7TIxkQQcKO.pdf) (31.4 KB) [PSIS2.pdf|attachment](upload://zUeUX6QqsT4l1XaQglF8g7csG9v.pdf) (31.3 KB)
Please advise,
AnaLOO.pdf (473.2 KB)LOO2.pdf (473.2 KB)PSIS.pdf (31.4 KB)PSIS2.pdf (31.3 KB)