I am trying to compare several bayesian models with loo as suggested in the rstanarm vignettes. For one of the models, I do get following warning:
loo3 <- loo(fitACRB_3)
Warning message:
Found 4 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 4 times to compute the ELPDs for the problematic observations directly.
As adviced by the warning message, I specified the k_threshold, but get following error:
tt <- loo(fitACRB_3,k_threshold = 0.7)
4 problematic observation(s) found.
Model will be refit 4 times.
Fitting model 1 out of 4 (leaving out observation 520)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument
Does anyone know what is going wrong? I assume it has to do with my data, as I am not receiving this error with the examples specified in the vignette. I can mail the data, if needed.
Was the problem associated with this error ever resolved? I am getting the same error after running a model of the form:
m1 ← stan_lmer(elast_sim ~ (1|studyname), data = dfs,
prior = normal(0, 1, autoscale = FALSE),
prior_aux = student_t(3, 0, 1, autoscale = FALSE),
adapt_delta = .99)
and then
l1 ← loo(m1, k_threshold=0.7)
2 problematic observation(s) found.
Model will be refit 2 times.
Fitting model 1 out of 2 (leaving out observation 134)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid ‘times’ argument
Any idea what might be generating this error?
thanks
Beats me. If you specify options(error = recover) before calling loo, then it should let you jump into the frame that calls the reloo function. Can you tell us what it then says for nrow(d) and length(omitted)?
I missed this last time. Can provide a reproducible example? If you can’t send the data you used, simulate something and set k_threshold low enough to get at least one refit.
Thanks. So I think there is something going on with the dataframe structure.
When I estimate the simple model on the full dataframe which has lots of nonused columns, I get the error shown in previous post. However, when I subset the dataframe to just the two columns used in the stan_lmer call, then loo works fine.
Here is an example.
library(tidyverse)
library(rstanarm)
id <- "1TIkvD-DbVo4WRnTWzExXA9Xzk9FlT91Q"
dat <- read_csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))
m1 <- stan_lmer(y ~ (1|studyid), data = dat,
prior = normal(0, 1, autoscale = FALSE),
prior_aux = student_t(3, 0, 1, autoscale = FALSE),
adapt_delta = .99)
loo1 <- loo(m1, k_threshold=0.7)
# 2 problematic observation(s) found.
# Model will be refit 2 times.
#
# Fitting model 1 out of 2 (leaving out observation 134)
# Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument
nrow(d) = 0 and length(omitted)=1
# Now subset the data.
dat2 <- dat %>% select(y, studyid)
m2 <- stan_lmer(y ~ (1|studyid), data = dat2,
prior = normal(0, 1, autoscale = FALSE),
prior_aux = student_t(3, 0, 1, autoscale = FALSE),
adapt_delta = .99)
loo2 <- loo(m2, k_threshold=0.7)
4 problematic observation(s) found.
Model will be refit 4 times.
Fitting model 1 out of 4 (leaving out observation 54)
Fitting model 2 out of 4 (leaving out observation 55)
Fitting model 3 out of 4 (leaving out observation 134)
Fitting model 4 out of 4 (leaving out observation 137)
> loo2
Computed from 4000 by 256 log-likelihood matrix
Estimate SE
elpd_loo 21.7 33.8
p_loo 24.3 7.9
looic -43.3 67.6
------
Monte Carlo SE of elpd_loo is 0.4.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 250 99.2% 1770
(0.5, 0.7] (ok) 2 0.8% 1717
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
It works but not sure why I had to simplify the dataframe.
I was using rstanarm 2.17.3 and loo 2.0.0.
Now updated to rstanarm 2.18.2 and loo 2.1.0 —> no longer getting error. thanks. Sorry. Updating to the latest - should have been one of my first steps.