Loo throwing error when computing ELPDs for problematic observations

ABrown · May 15, 2018, 7:02pm

I’ve been working with rstanarm and loo for several months, just updated to the newest versions a couple weeks ago, and since then have started seeing this error message when running the loo function:

>   l_full5 <- loo(full5, k_threshold = 0.7)
1 problematic observation(s) found.
Model will be refit 1 times.

Fitting model 1 out of 1 (leaving out observation 2252)
Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: model_bernoulli_namespace::model_bernoulli: N[k0__] is 0, but must be greater than or equal to 1  (in 'model_bernoulli' at line 12)

failed to create the sampler; sampling not done
Error in check_stanfit(stanfit) : 
  Invalid stanfit object produced please report bug

This only happens when a problematic observation(s) is/are found, and furthermore, seems to happen every time a problematic observation is found. This is happening for the more complex models I’m using.

Here are the data I’m using: dat_with_soil2.csv (865.8 KB),
and code that should reproduce the problem:

dat$BAc5 <- scale(dat$BAc5)
dat$BAh5 <- scale(dat$BAh5)
dat$n_con <- scale(dat$n_con)
dat$n_het <- scale(dat$n_het)
dat$gap <- scale(dat$gap)
dat$ph <- scale(dat$ph)
dat$tec <- scale(dat$tec)
dat$org <- scale(dat$org)

full5 <- stan_glm(surv ~ 
                    n_con + n_het +
                    BAc5*gap + BAh5*gap +
                    BAc5*ph + BAh5*ph + 
                    BAc5*tec + BAh5*tec +
                    BAc5*org + BAh5*org,
                  prior = student_t(df=7, location=0, autoscale=TRUE),
                  prior_intercept = student_t(df=7, location=0, autoscale=TRUE),
                  adapt_delta = 0.99,
                  init_r = 0.1,
                  family = binomial(link = "logit"),
                  data = dat)

l_full5 <- loo(full5, k_threshold=0.7)

I hugely appreciate any guidance you might have. Thank you so much for the work you do!

Operating System: Windows 10
rstanarm Version: 2.17.4
loo Version: 2.0.0
R Version: 3.3.1

bgoodri · May 15, 2018, 7:23pm

Is observation 2252 the only one where the outcome is zero?

ABrown · May 15, 2018, 9:13pm

Thanks for the quick reply!
I think I might misunderstand your question - by “outcome” do you mean the response variable (survival)? If so, then no, 2252 would not be the only zero; there should be 841 zeros and 3436 ones.

jonah · May 16, 2018, 3:19am

Hi @ABrown, I don’t think you misunderstood @bgoodri’s question. He asked that because of the particular error message you got. N[k0__] is 0, but must be greater than or equal to 1 is referring to a variable in the underlying Stan code that contains the number of 0s observed. We should catch this in the R code and throw an informative error before calling Stan so it doesn’t spit out all this jibberish.

In any case, this may be a bug in rstanarm. I’ll report back when I’ve had a chance to run your code.
Thanks for sharing the data and reproducible example!

jonah · May 16, 2018, 5:11pm

Ok I figured out the problem. It’s a bug in rstanarm that should be easy to fix. There’s a line in one of the internal functions in rstanarm that is causing na.omit to be called on the data before refitting the model. When observation 2252 is left out and then na.omit is called, it turns out that there are no more 0s in the data:

table(dat$surv)

   0    1 
 841 3436 


table(na.omit(dat)$surv)

   1 
3364

The NAs are in variables that aren’t actually included in your model formula, which shouldn’t be a problem. However, in the code used to refit the model when there is a high k value we aren’t treating those NAs differently from NAs in the variables actually used in the model (the observations are being dropped but they shouldn’t be). This should be easy to fix for the next release of rstanarm.

Until then you should be able to get around this problem by only including variables you use in the model formula when passing a data frame to the data argument of stan_glm(). Does that work for you?

jonah · May 16, 2018, 5:33pm

I just opened an issue for this so we get it permanently fixed:

I think that workaround I mentioned in my previous post should be a good short term solution.

ABrown · May 16, 2018, 6:20pm

Yes, that makes sense @jonah. Thank you so much for your timely help!!

Topic		Replies	Views
Loo with k_threshold error for stan_polr() rstanarm loo , rstanarm	23	1594	August 4, 2020
Loo error when specifying 'k_threshold = 0.7' Modeling loo	9	1152	March 14, 2019
LOO error when k_threshold = 0.7 fits model that drops sole observation of factor level rstanarm loo	8	1060	August 30, 2018
Loo moment matching not working General rstan , loo , r	9	1380	February 2, 2021
Errors when calculating PSIS-LOO CV in loo version 2.2.0 Modeling fitting-issues , loo	7	1240	January 2, 2020

Loo throwing error when computing ELPDs for problematic observations

Related topics