Loo error when specifying 'k_threshold = 0.7'

Jurgen · October 2, 2018, 3:00pm

Dear all,

I am trying to compare several bayesian models with loo as suggested in the rstanarm vignettes. For one of the models, I do get following warning:

loo3 <- loo(fitACRB_3)
Warning message:
Found 4 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 4 times to compute the ELPDs for the problematic observations directly.

As adviced by the warning message, I specified the k_threshold, but get following error:

tt <- loo(fitACRB_3,k_threshold = 0.7)
4 problematic observation(s) found.
Model will be refit 4 times.

Fitting model 1 out of 4 (leaving out observation 520)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument

Does anyone know what is going wrong? I assume it has to do with my data, as I am not receiving this error with the examples specified in the vignette. I can mail the data, if needed.

Kind Regards,
Jürgen

bgoodri · October 2, 2018, 4:05pm

Sounds like you didn’t pass a data.frame to the data argument in the original step where you get the posterior distribution.

Jurgen · October 3, 2018, 7:49am

The data is in a data.frame:

class(dataFINAL2)
[1] "data.frame"
dim(dataFINAL2)
[1] 2457   22
fitAcrB <- stan_glm(indicator ~ as.factor(experiment)*as.factor(transMembrane) + MHP + RT + Inten + pI + hydrophob + helicoProp, family="binomial",dataFINAL2)

Michael_Papenfus · March 12, 2019, 11:33pm

Was the problem associated with this error ever resolved? I am getting the same error after running a model of the form:

m1 ← stan_lmer(elast_sim ~ (1|studyname), data = dfs,
prior = normal(0, 1, autoscale = FALSE),
prior_aux = student_t(3, 0, 1, autoscale = FALSE),
adapt_delta = .99)
and then

l1 ← loo(m1, k_threshold=0.7)

2 problematic observation(s) found.
Model will be refit 2 times.
Fitting model 1 out of 2 (leaving out observation 134)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid ‘times’ argument

Any idea what might be generating this error?
thanks

bgoodri · March 13, 2019, 4:06am

Beats me. If you specify options(error = recover) before calling loo, then it should let you jump into the frame that calls the reloo function. Can you tell us what it then says for nrow(d) and length(omitted)?

avehtari · March 13, 2019, 3:10pm

I missed this last time. Can provide a reproducible example? If you can’t send the data you used, simulate something and set k_threshold low enough to get at least one refit.

Michael_Papenfus · March 13, 2019, 9:00pm

Thanks. So I think there is something going on with the dataframe structure.

When I estimate the simple model on the full dataframe which has lots of nonused columns, I get the error shown in previous post. However, when I subset the dataframe to just the two columns used in the stan_lmer call, then loo works fine.

Here is an example.

library(tidyverse)
library(rstanarm)

id <- "1TIkvD-DbVo4WRnTWzExXA9Xzk9FlT91Q"
dat <- read_csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))

m1 <- stan_lmer(y ~ (1|studyid), data = dat,
                prior = normal(0, 1, autoscale = FALSE),
                prior_aux = student_t(3, 0, 1, autoscale = FALSE),
                adapt_delta = .99)

loo1 <- loo(m1, k_threshold=0.7)
# 2 problematic observation(s) found.
# Model will be refit 2 times.
# 
# Fitting model 1 out of 2 (leaving out observation 134)
# Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument

nrow(d) = 0 and length(omitted)=1


# Now subset the data.

dat2 <- dat %>% select(y, studyid)
m2 <- stan_lmer(y ~ (1|studyid), data = dat2,
                prior = normal(0, 1, autoscale = FALSE),
                prior_aux = student_t(3, 0, 1, autoscale = FALSE),
                adapt_delta = .99)
loo2 <- loo(m2, k_threshold=0.7)
4 problematic observation(s) found.
Model will be refit 4 times.

Fitting model 1 out of 4 (leaving out observation 54)

Fitting model 2 out of 4 (leaving out observation 55)

Fitting model 3 out of 4 (leaving out observation 134)

Fitting model 4 out of 4 (leaving out observation 137)
> loo2

Computed from 4000 by 256 log-likelihood matrix

         Estimate   SE
elpd_loo     21.7 33.8
p_loo        24.3  7.9
looic       -43.3 67.6
------
Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     250   99.2%   1770      
 (0.5, 0.7]   (ok)         2    0.8%   1717      
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

It works but not sure why I had to simplify the dataframe.

avehtari · March 14, 2019, 12:44pm

Works for me with rstanarm_2.18.2. Which rstanarm and loo version you are using?

Aki

Michael_Papenfus · March 14, 2019, 5:43pm

I was using rstanarm 2.17.3 and loo 2.0.0.
Now updated to rstanarm 2.18.2 and loo 2.1.0 —> no longer getting error. thanks. Sorry. Updating to the latest - should have been one of my first steps.

avehtari · March 14, 2019, 5:47pm

No problem, great that it works now!

Topic		Replies	Views
LOO error when k_threshold = 0.7 fits model that drops sole observation of factor level rstanarm loo	8	1060	August 30, 2018
Loo with k_threshold error for stan_polr() rstanarm loo , rstanarm	23	1594	August 4, 2020
Max Farrell's tutorial says refit model without outliers k>0.7 Modeling loo	9	1358	October 22, 2021
Recommendations for what to do when k exceeds 0.5 in the loo package? Modeling loo	21	7488	March 8, 2018
Loo throwing error when computing ELPDs for problematic observations rstanarm loo	6	1111	May 16, 2018

Loo error when specifying 'k_threshold = 0.7'

Related topics