Errors when calculating PSIS-LOO CV in loo version 2.2.0

I am having inconsistent issues calculating LOO on Stan models (this used to run on older versions). To diagnose things, I am running through the LOO tutorial example http://mc-stan.org/loo/articles/loo2-with-rstan.html and getting the following issues (which are the same errors as I get on my own models). I am not sure if it is (1) a Stan or LOO install problem, (2) maybe an issue unique to my computer, or (3) whether LOO has changed how it extracts the log likelihood from the model.

Overall, I run through the script as shown in the tutorial. The model fits and I can extract log_lik values from the model output. But when I go to calculate LOO (or compare models or calculate relative efficiencies) from those extracted values I get the following errors:

log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)
r_eff <- relative_eff(exp(log_lik_1), cores = 2)
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: could not find function "ess_rfun"

This error is inconsistent. For examples, sometimes it reports that it cannot find other functions like ā€œenough_tail_samples()ā€.

However, if I run LOO directly from the Stan model output termed ā€˜fit1’ then it works.

loo(fit_1)

Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1968.4 15.6
p_loo         3.2  0.1
looic      3936.9 31.2
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

The bigger issue is that this seems to change how I can use loo with other custom models or packages like ā€˜atsar’, etc. even if those models return the log_lik values.

My whole R script from the tutorial looks like:

library("rstan")

# Prepare data 
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
wells$dist100 <- with(wells, dist / 100)
X <- model.matrix(~ dist100 + arsenic, wells)
standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X))

# Fit model
fit_1 <- stan("Stan/logistic.stan", data = standata)
print(fit_1, pars = "beta")

library("loo")

# Extract pointwise log-likelihood
# using merge_chains=FALSE returns an array, which is easier to 
# use with relative_eff()
log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)

# as of loo v2.0.0 we can optionally provide relative effective sample sizes
# when calling loo, which allows for better estimates of the PSIS effective
# sample sizes and Monte Carlo error
r_eff <- relative_eff(exp(log_lik_1), cores = 2) 

# preferably use more than 2 cores (as many cores as possible)
# will use value of 'mc.cores' option if cores is not specified
loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)

The Stan model is the same as the tutorial:

data {
  int<lower=0> N;             // number of data points
  int<lower=0> P;             // number of predictors (including intercept)
  matrix[N,P] X;              // predictors (including 1s for intercept)
  int<lower=0,upper=1> y[N];  // binary outcome
}
parameters {
  vector[P] beta;
}
model {
  beta ~ normal(0, 1);
  y ~ bernoulli_logit(X * beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);
  }
}

My R session info

R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loo_2.2.0          rstan_2.19.2       ggplot2_3.2.1      StanHeaders_2.19.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         pillar_1.4.3       compiler_3.6.2     prettyunits_1.0.2  tools_3.6.2       
 [6] packrat_0.5.0      pkgbuild_1.0.6     checkmate_1.9.4    lifecycle_0.1.0    tibble_2.1.3      
[11] gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.2        cli_2.0.0          rstudioapi_0.10   
[16] parallel_3.6.2     gridExtra_2.3      withr_2.1.2        dplyr_0.8.3        stats4_3.6.2      
[21] grid_3.6.2         tidyselect_0.2.5   glue_1.3.1         inline_0.3.15      R6_2.4.1          
[26] processx_3.4.1     fansi_0.4.0        purrr_0.3.3        callr_3.4.0        magrittr_1.5      
[31] backports_1.1.5    scales_1.1.0       ps_1.3.0           codetools_0.2-16   matrixStats_0.55.0
[36] assertthat_0.2.1   colorspace_1.4-1   lazyeval_0.2.2     munsell_0.5.0      crayon_1.3.4 

Thanks for reporting this. This might be related to the refactoring in loo 2.2.0, but since the code you give works for me with packages

[1] loo_2.2.0          rstan_2.19.2       ggplot2_3.2.1      StanHeaders_2.19.0

I recommend first to try re-installing rstan and check the rstan version. My exact rstan version which is shown when running library(rstan) is rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)

Thanks @avehtari. I re-installed rstan and have the same version as you do: rstan (Version 2.19.2, GitRev: 2e1f913d3ca3) . Re-running the LOO tutorial script still nets the same error on this line after extracting the log_lik values:

r_eff <- relative_eff(exp(log_lik_1), cores = 2) 
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: could not find function "ess_rfun"

Interestingly, this line is working with cores=1. My mc.cores option is set to 8 following:

options(mc.cores = parallel::detectCores())

I wonder if LOO isn’t automatically detected on each of the cores?

This works for me with both cores=2 and cores=1.

Pinging @jonah and @mans_magnusson if they have any ideas how to continue from here.

Thanks @klwilson23 !

I tried the code as well and I couldn’t reproduce the error (on a mac). My hypothesis is that this is related to parallelism in Windows (since that is specific to Windows and only happens when cores > 1).

I cannot test Windows parallelism from my computer but I think we now have appveyor up in the loo repo, so we could maybe be able to test it there (or if @paul.buerkner have time for a quick test).

I guess the best thing would be to file it as an issue in the loo repo?

/MƄns

I checked a little more.

My hypothesis is that the problem arise at row 208-215:

They seem to pass the testsuite. I have a vague memory of @jonah having some problem with Windows when appveyor was included, but I do not remember the details.

/MƄns

Thanks @mans_magnusson. I was able to identify the problem a bit more. In short, I think it is more of a broken interaction for LOO calling parellel cores within an RStudio project directory and package loading.

To summarize, my errors only occurred when I was working from an RStudio project directory (for GitHub version control) and have a custom folder for my home package library. I think when running LOO from within the project directory there is a broken file path for Windows parallel cores unable to find the package location for LOO and its dependencies (despite that library path being specified in a default .Rprofile file).

Note that when I am not working within the project folder, the tutorial code works successfully with mc.cores=2. So its seems like a RStudio Project x Parallel interaction.

I’m not sure how best to allow the cores called from the project folder to inherit the correct file paths yet, but I think this is likely beyond LOO specific problems though (unless there’s a default pathway LOO loads dependencies?).

One way I’ve proceeded in the path with the doSNOW package is something like this:

worker.init <- function() {
  .libPaths("FILE PATH")
  library(PACKAGE NAME)
  NULL
}

cl<-makeCluster(parallel::detectCores())
registerDoSNOW(cl)
clusterCall(cl,worker.init)

which then loads packages on each of the cores. But I think LOO calls some internal functions for initiating cores and am not sure how to allow those cores to find the correct file path.

Thats great! Indeed, then it seem to be out of scope for the loo package. I guess the developers of the parallel package would be the right persons to file an issue to? Or the R-Studio people?