Loo error when specifying 'k_threshold = 0.7'

loo

#1

Dear all,

I am trying to compare several bayesian models with loo as suggested in the rstanarm vignettes. For one of the models, I do get following warning:

loo3 <- loo(fitACRB_3)
Warning message:
Found 4 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 4 times to compute the ELPDs for the problematic observations directly.

As adviced by the warning message, I specified the k_threshold, but get following error:

tt <- loo(fitACRB_3,k_threshold = 0.7)
4 problematic observation(s) found.
Model will be refit 4 times.

Fitting model 1 out of 4 (leaving out observation 520)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument

Does anyone know what is going wrong? I assume it has to do with my data, as I am not receiving this error with the examples specified in the vignette. I can mail the data, if needed.

Kind Regards,
Jürgen


#2

Sounds like you didn’t pass a data.frame to the data argument in the original step where you get the posterior distribution.


#3

The data is in a data.frame:

class(dataFINAL2)
[1] "data.frame"
dim(dataFINAL2)
[1] 2457   22
fitAcrB <- stan_glm(indicator ~ as.factor(experiment)*as.factor(transMembrane) + MHP + RT + Inten + pI + hydrophob + helicoProp, family="binomial",dataFINAL2)

#4

Was the problem associated with this error ever resolved? I am getting the same error after running a model of the form:

m1 <- stan_lmer(elast_sim ~ (1|studyname), data = dfs,
prior = normal(0, 1, autoscale = FALSE),
prior_aux = student_t(3, 0, 1, autoscale = FALSE),
adapt_delta = .99)
and then

l1 <- loo(m1, k_threshold=0.7)

2 problematic observation(s) found.
Model will be refit 2 times.
Fitting model 1 out of 2 (leaving out observation 134)
Error in rep(TRUE, nrow(d) - length(omitted)) : invalid ‘times’ argument

Any idea what might be generating this error?
thanks


#5

Beats me. If you specify options(error = recover) before calling loo, then it should let you jump into the frame that calls the reloo function. Can you tell us what it then says for nrow(d) and length(omitted)?


#6

I missed this last time. Can provide a reproducible example? If you can’t send the data you used, simulate something and set k_threshold low enough to get at least one refit.


#7

Thanks. So I think there is something going on with the dataframe structure.

When I estimate the simple model on the full dataframe which has lots of nonused columns, I get the error shown in previous post. However, when I subset the dataframe to just the two columns used in the stan_lmer call, then loo works fine.

Here is an example.

library(tidyverse)
library(rstanarm)

id <- "1TIkvD-DbVo4WRnTWzExXA9Xzk9FlT91Q"
dat <- read_csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))

m1 <- stan_lmer(y ~ (1|studyid), data = dat,
                prior = normal(0, 1, autoscale = FALSE),
                prior_aux = student_t(3, 0, 1, autoscale = FALSE),
                adapt_delta = .99)

loo1 <- loo(m1, k_threshold=0.7)
# 2 problematic observation(s) found.
# Model will be refit 2 times.
# 
# Fitting model 1 out of 2 (leaving out observation 134)
# Error in rep(TRUE, nrow(d) - length(omitted)) : invalid 'times' argument

nrow(d) = 0 and length(omitted)=1


# Now subset the data.

dat2 <- dat %>% select(y, studyid)
m2 <- stan_lmer(y ~ (1|studyid), data = dat2,
                prior = normal(0, 1, autoscale = FALSE),
                prior_aux = student_t(3, 0, 1, autoscale = FALSE),
                adapt_delta = .99)
loo2 <- loo(m2, k_threshold=0.7)
4 problematic observation(s) found.
Model will be refit 4 times.

Fitting model 1 out of 4 (leaving out observation 54)

Fitting model 2 out of 4 (leaving out observation 55)

Fitting model 3 out of 4 (leaving out observation 134)

Fitting model 4 out of 4 (leaving out observation 137)
> loo2

Computed from 4000 by 256 log-likelihood matrix

         Estimate   SE
elpd_loo     21.7 33.8
p_loo        24.3  7.9
looic       -43.3 67.6
------
Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     250   99.2%   1770      
 (0.5, 0.7]   (ok)         2    0.8%   1717      
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

It works but not sure why I had to simplify the dataframe.


#8

Works for me with rstanarm_2.18.2. Which rstanarm and loo version you are using?

Aki


#9

I was using rstanarm 2.17.3 and loo 2.0.0.
Now updated to rstanarm 2.18.2 and loo 2.1.0 —> no longer getting error. thanks. Sorry. Updating to the latest - should have been one of my first steps.


#10

No problem, great that it works now!