Errors when calculating PSIS-LOO CV in loo version 2.2.0

I am having inconsistent issues calculating LOO on Stan models (this used to run on older versions). To diagnose things, I am running through the LOO tutorial example http://mc-stan.org/loo/articles/loo2-with-rstan.html and getting the following issues (which are the same errors as I get on my own models). I am not sure if it is (1) a Stan or LOO install problem, (2) maybe an issue unique to my computer, or (3) whether LOO has changed how it extracts the log likelihood from the model.

Overall, I run through the script as shown in the tutorial. The model fits and I can extract log_lik values from the model output. But when I go to calculate LOO (or compare models or calculate relative efficiencies) from those extracted values I get the following errors:

log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)
r_eff <- relative_eff(exp(log_lik_1), cores = 2)
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: could not find function "ess_rfun"

This error is inconsistent. For examples, sometimes it reports that it cannot find other functions like ā€œenough_tail_samples()ā€.

However, if I run LOO directly from the Stan model output termed ā€˜fit1ā€™ then it works.

loo(fit_1)

Computed from 4000 by 3020 log-likelihood matrix

         Estimate   SE
elpd_loo  -1968.4 15.6
p_loo         3.2  0.1
looic      3936.9 31.2
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

The bigger issue is that this seems to change how I can use loo with other custom models or packages like ā€˜atsarā€™, etc. even if those models return the log_lik values.

My whole R script from the tutorial looks like:

library("rstan")

# Prepare data 
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
wells$dist100 <- with(wells, dist / 100)
X <- model.matrix(~ dist100 + arsenic, wells)
standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X))

# Fit model
fit_1 <- stan("Stan/logistic.stan", data = standata)
print(fit_1, pars = "beta")

library("loo")

# Extract pointwise log-likelihood
# using merge_chains=FALSE returns an array, which is easier to 
# use with relative_eff()
log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)

# as of loo v2.0.0 we can optionally provide relative effective sample sizes
# when calling loo, which allows for better estimates of the PSIS effective
# sample sizes and Monte Carlo error
r_eff <- relative_eff(exp(log_lik_1), cores = 2) 

# preferably use more than 2 cores (as many cores as possible)
# will use value of 'mc.cores' option if cores is not specified
loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)

The Stan model is the same as the tutorial:

data {
  int<lower=0> N;             // number of data points
  int<lower=0> P;             // number of predictors (including intercept)
  matrix[N,P] X;              // predictors (including 1s for intercept)
  int<lower=0,upper=1> y[N];  // binary outcome
}
parameters {
  vector[P] beta;
}
model {
  beta ~ normal(0, 1);
  y ~ bernoulli_logit(X * beta);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);
  }
}

My R session info

R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loo_2.2.0          rstan_2.19.2       ggplot2_3.2.1      StanHeaders_2.19.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         pillar_1.4.3       compiler_3.6.2     prettyunits_1.0.2  tools_3.6.2       
 [6] packrat_0.5.0      pkgbuild_1.0.6     checkmate_1.9.4    lifecycle_0.1.0    tibble_2.1.3      
[11] gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.2        cli_2.0.0          rstudioapi_0.10   
[16] parallel_3.6.2     gridExtra_2.3      withr_2.1.2        dplyr_0.8.3        stats4_3.6.2      
[21] grid_3.6.2         tidyselect_0.2.5   glue_1.3.1         inline_0.3.15      R6_2.4.1          
[26] processx_3.4.1     fansi_0.4.0        purrr_0.3.3        callr_3.4.0        magrittr_1.5      
[31] backports_1.1.5    scales_1.1.0       ps_1.3.0           codetools_0.2-16   matrixStats_0.55.0
[36] assertthat_0.2.1   colorspace_1.4-1   lazyeval_0.2.2     munsell_0.5.0      crayon_1.3.4 

Thanks for reporting this. This might be related to the refactoring in loo 2.2.0, but since the code you give works for me with packages

[1] loo_2.2.0          rstan_2.19.2       ggplot2_3.2.1      StanHeaders_2.19.0

I recommend first to try re-installing rstan and check the rstan version. My exact rstan version which is shown when running library(rstan) is rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)

Thanks @avehtari. I re-installed rstan and have the same version as you do: rstan (Version 2.19.2, GitRev: 2e1f913d3ca3) . Re-running the LOO tutorial script still nets the same error on this line after extracting the log_lik values:

r_eff <- relative_eff(exp(log_lik_1), cores = 2) 
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: could not find function "ess_rfun"

Interestingly, this line is working with cores=1. My mc.cores option is set to 8 following:

options(mc.cores = parallel::detectCores())

I wonder if LOO isnā€™t automatically detected on each of the cores?

This works for me with both cores=2 and cores=1.

Pinging @jonah and @mans_magnusson if they have any ideas how to continue from here.

Thanks @klwilson23 !

I tried the code as well and I couldnā€™t reproduce the error (on a mac). My hypothesis is that this is related to parallelism in Windows (since that is specific to Windows and only happens when cores > 1).

I cannot test Windows parallelism from my computer but I think we now have appveyor up in the loo repo, so we could maybe be able to test it there (or if @paul.buerkner have time for a quick test).

I guess the best thing would be to file it as an issue in the loo repo?

/MĆ„ns

1 Like

I checked a little more.

My hypothesis is that the problem arise at row 208-215:

They seem to pass the testsuite. I have a vague memory of @jonah having some problem with Windows when appveyor was included, but I do not remember the details.

/MĆ„ns

1 Like

Thanks @mans_magnusson. I was able to identify the problem a bit more. In short, I think it is more of a broken interaction for LOO calling parellel cores within an RStudio project directory and package loading.

To summarize, my errors only occurred when I was working from an RStudio project directory (for GitHub version control) and have a custom folder for my home package library. I think when running LOO from within the project directory there is a broken file path for Windows parallel cores unable to find the package location for LOO and its dependencies (despite that library path being specified in a default .Rprofile file).

Note that when I am not working within the project folder, the tutorial code works successfully with mc.cores=2. So its seems like a RStudio Project x Parallel interaction.

Iā€™m not sure how best to allow the cores called from the project folder to inherit the correct file paths yet, but I think this is likely beyond LOO specific problems though (unless thereā€™s a default pathway LOO loads dependencies?).

One way Iā€™ve proceeded in the path with the doSNOW package is something like this:

worker.init <- function() {
  .libPaths("FILE PATH")
  library(PACKAGE NAME)
  NULL
}

cl<-makeCluster(parallel::detectCores())
registerDoSNOW(cl)
clusterCall(cl,worker.init)

which then loads packages on each of the cores. But I think LOO calls some internal functions for initiating cores and am not sure how to allow those cores to find the correct file path.

1 Like

Thats great! Indeed, then it seem to be out of scope for the loo package. I guess the developers of the parallel package would be the right persons to file an issue to? Or the R-Studio people?