Moment_match silently ignored when using multiple cores in loo_moment_match

Hello! This is my first post in the Stan forums, so I’m not sure if I’ve chosen the most appropriate tags to categorize the issue—apologies if not!

I’d like to share an issue I’ve encountered with the moment_match method in the loo package. If I set cores = 1, the method works as expected and adjusts problematic Pareto k values. But if I run the same call with multiple cores, moment_match is silently skipped and the result is identical to a regular loo() call—giving the false impression that moment matching was applied but failed to improve the fit.

Below is a reproducible example using the same code from the vignette Avoiding model refits in leave-one-out cross-validation with moment matching. I’m using cmdstanr here, but the same behavior happens with rstan. The Stan model used is identical to the one in the vignette, except that I renamed the offset variable to shift, as offset is a reserved name in Stan. The full model code is attached at the end of this post.

# Load packages
library(cmdstanr)
library(loo)

# Set seed for reproducible results
seed <- 9547
set.seed(seed)

# Prepare data
data(roaches, package = "rstanarm")
roaches$roach1 <- sqrt(roaches$roach1)
y <- roaches$y
x <- roaches[,c("roach1", "treatment", "senior")]
shift <- log(roaches[,"exposure2"])
n <- dim(x)[1]
k <- dim(x)[2]

# Stan data list
standata <- list(
  N = n, 
  K = k, 
  x = as.matrix(x), 
  y = y, 
  shift = shift, 
  beta_prior_scale = 2.5, 
  alpha_prior_scale = 5.0)

# Compile stan model with cmdstanr
stanmodel <- cmdstan_model(stan_file = "MM_test.stan", force_recompile = TRUE)

# Fit model
fit <- stanmodel$sample(
  data = standata, 
  seed = seed, 
  refresh = 500
)

As you can see in the following R chunk, loo() produces some high Pareto k diagnostics, and moment_match corrects this when run in single-threaded mode. But if I run the same call with multiple cores, it silently returns the uncorrected loo() result—giving the false impression that moment_match fails to improve things.

> # Raw loo
> (loo1 <- fit$loo())

Computed from 4000 by 262 log-likelihood matrix.

         Estimate     SE
elpd_loo  -5455.2  692.6
p_loo       250.5   53.0
looic     10910.5 1385.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     245   93.5%   58      
   (0.7, 1]   (bad)       10    3.8%   <NA>    
   (1, Inf)   (very bad)   7    2.7%   <NA>    
See help('pareto-k-diagnostic') for details.
Aviso:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
> 
> # Moment matching
> fit$loo(moment_match = TRUE)

Computed from 4000 by 262 log-likelihood matrix.

         Estimate     SE
elpd_loo  -5477.4  699.8
p_loo       272.6   61.7
looic     10954.8 1399.6
------
MCSE of elpd_loo is 0.4.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.0]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
> 
> # Parallelized moment matching
> fit$loo(moment_match = TRUE, cores = 6)

Computed from 4000 by 262 log-likelihood matrix.

         Estimate     SE
elpd_loo  -5455.2  692.6
p_loo       250.5   53.0
looic     10910.5 1385.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     245   93.5%   58      
   (0.7, 1]   (bad)       10    3.8%   <NA>    
   (1, Inf)   (very bad)   7    2.7%   <NA>    
See help('pareto-k-diagnostic') for details.
Aviso:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

In addition to this, when using moment_match in parallel, I’m unable to pass external R objects (e.g., a data frame). As noted in the loo vignette on large datasets, sometimes it’s more efficient to compute the log-likelihood outside of Stan. If we choose this approach, we also need to supply custom likelihood functions for moment_match. Continuing with the same example, the following code shows how to do this using cmdstanr instead of rstan:

# Log‐likelihood for observation i
ll_fun_i <- function(x, i, data = standata, ...) {
  
  # Posterior draws
  draws <- x$draws(variables = c("beta", "intercept"), format = "draws_array")
  
  # Parameter arrays 
  betas_array     <- draws[,,grep("^beta", dimnames(draws)[[3]])]
  intercept_array <- draws[,,"intercept"]
  
  # Empty linear prediction array
  mu_list <- vector(mode = "list", length = dim(draws)[2])
  for(j in 1:dim(draws)[2]){
    
    # Prepare Beta and X matrices
    Beta <- betas_array[,j,,drop = TRUE]
    X    <- matrix(data$x[i, ], ncol = 1)
    
    # Compute linear prediction
    mu_list[[j]] <- Beta %*% X + c(intercept_array[,j,]) + data$shift[i]
  }
  
  # Generate a S x P matrix
  mu_mat <- do.call(cbind, mu_list)
  
  # Compute log-likelihood
  dpois(x = data$y[i], lambda = exp(mu_mat), log = TRUE)
}

# Log-Likelihoo based on unconstrain parameters
log_lik_i_upars <- function(x, upars, i, data = standata, ...) {
  
  # Constrain pars
  pars <- apply(upars, 1, x$constrain_variables)
  
  # Beta matrix and intercept vector
  Beta <- do.call(rbind, lapply(pars, function(x) x$beta))
  intercept <- sapply(pars, function(x) x$intercept)
  
  # Compute linear prediction
  mu_vec <- Beta %*% matrix(data$x[i, ], ncol = 1) + intercept + data$shift[i]
  
  # Compute log-likelihood
  c(dpois(x = data$y[i], lambda = exp(mu_vec), log = TRUE))
}

# Hand-made moment matching
loo_mm <- loo::loo_moment_match.default(
  x                 = fit,
  loo               = loo1,
  post_draws        = function(x, ...) { x$draws(format = "draws_matrix") },
  log_lik_i         = ll_fun_i,
  unconstrain_pars  = function(x, pars, ...) { x$unconstrain_draws(format = "draws_matrix") },
  log_prob_upars    = function(x, upars, ...) { apply(upars, 1, x$log_prob) },
  log_lik_i_upars   = log_lik_i_upars, cores = 4, data = standata)

With cores = 1, everything works: moment_match runs correctly, and my external data can be accessed. But increasing the number of cores breaks both behaviors: moment_match is not applied, and I get an error saying that my data object is not found. I’ve tried various workarounds (passing data explicitly, omitting it and relying on ..., etc.), but it seems the problem isn’t just how the arguments are passed.

> # Cores = 1 and without specifying data
> loo_mm <- loo::loo_moment_match.default(
+   x                 = fit,
+   loo               = loo1,
+   post_draws        = function(x, ...) { x$draws(format = "draws_matrix") },
+   log_lik_i         = ll_fun_i,
+   unconstrain_pars  = function(x, pars, ...) { x$unconstrain_draws(format = "draws_matrix") },
+   log_prob_upars    = function(x, upars, ...) { apply(upars, 1, x$log_prob) },
+   log_lik_i_upars   = log_lik_i_upars, cores = 1)
> loo_mm

Computed from 4000 by 262 log-likelihood matrix.

         Estimate     SE
elpd_loo  -5477.4  699.8
p_loo       272.6   61.7
looic     10954.8 1399.6
------
MCSE of elpd_loo is 0.4.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.0]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
> # Cores > 1 and specifying data
> loo_mm_2 <- loo::loo_moment_match.default(
+   x                 = fit,
+   loo               = loo1,
+   post_draws        = function(x, ...) { x$draws(format = "draws_matrix") },
+   log_lik_i         = ll_fun_i,
+   unconstrain_pars  = function(x, pars, ...) { x$unconstrain_draws(format = "draws_matrix") },
+   log_prob_upars    = function(x, upars, ...) { apply(upars, 1, x$log_prob) },
+   log_lik_i_upars   = log_lik_i_upars, cores = 4, data = standata)
Error en checkForRemoteErrors(val): 
  4 nodes produced errors; first error: objeto 'standata' no encontrado

Finally, I’m including my full R session info below (I’m running this on Windows 11) in case it helps to diagnose the issue!

> sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8    LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.utf8    

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loo_2.8.0           cmdstanr_0.8.1.9000

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5          cli_3.6.3            knitr_1.50           rlang_1.1.4          xfun_0.52            processx_3.8.6      
 [7] generics_0.1.3       tensorA_0.36.2.1     data.table_1.17.0    jsonlite_2.0.0       glue_1.8.0           backports_1.5.0     
[13] distributional_0.5.0 ps_1.9.1             evaluate_1.0.3       tibble_3.2.1         abind_1.4-8          lifecycle_1.0.4     
[19] compiler_4.4.2       posterior_1.6.1      Rcpp_1.0.14          pkgconfig_2.0.3      rstudioapi_0.17.1    R6_2.6.1            
[25] utf8_1.2.4           pillar_1.10.2        parallel_4.4.2       magrittr_2.0.3       checkmate_2.3.2      tools_4.4.2         
[31] withr_3.0.2          matrixStats_1.5.0   

And this is the Stan model code

data {
  int<lower=1> K;
  int<lower=1> N;
  matrix[N,K] x;
  array[N] int y;
  vector[N] shift;

  real beta_prior_scale;
  real alpha_prior_scale;
}
parameters {
  vector[K] beta;
  real intercept;
}
model {
  y ~ poisson(exp(x * beta + intercept + shift));
  beta ~ normal(0,beta_prior_scale);
  intercept ~ normal(0,alpha_prior_scale);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N)
    log_lik[n] = poisson_lpmf(y[n] | exp(x[n] * beta + intercept + shift[n]));
}
2 Likes

@avehtari @jonah

1 Like

Thanks for the detailed report. We’ll investigate this

1 Like

Thanks! Let me know if you find anything. I’m currently implementing moment_match for my models, and being able to parallelize it would be a big advantage :)

Your examples work perfectly for me. One difference is that I used the latest CmdStanR v0.9.0, while you used v0.8.1.9000. Can you try with the latest version (e.g. from R-Universe)?

install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))

Hi Aki,

I’ve updated cmdstanr and the error persists, but I was also getting this error with rstan. I thought this was an issue with the loo package.

For simplicity, in case someone else wants to check if the same thing happens, I’m including below a code chunk using rstan that anyone can copy and paste into the R console. loo1 and loo_mm2 objects give me the same output.

Can you think of anything else I could try to identify the problem? I thought this would be easier 😅

library("rstan")
library("loo")
seed <- 9547
set.seed(seed)

stancode <- "
data {
  int<lower=1> K;
  int<lower=1> N;
  matrix[N,K] x;
  array[N] int y;
  vector[N] offset;

  real beta_prior_scale;
  real alpha_prior_scale;
}
parameters {
  vector[K] beta;
  real intercept;
}
model {
  y ~ poisson(exp(x * beta + intercept + offset));
  beta ~ normal(0,beta_prior_scale);
  intercept ~ normal(0,alpha_prior_scale);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N)
    log_lik[n] = poisson_lpmf(y[n] | exp(x[n] * beta + intercept + offset[n]));
}
"

# Prepare data
data(roaches, package = "rstanarm")
roaches$roach1 <- sqrt(roaches$roach1)
y <- roaches$y
x <- roaches[,c("roach1", "treatment", "senior")]
offset <- log(roaches[,"exposure2"])
n <- dim(x)[1]
k <- dim(x)[2]

standata <- list(N = n, K = k, x = as.matrix(x), y = y, offset = offset, beta_prior_scale = 2.5, alpha_prior_scale = 5.0)

# Compile
stanmodel <- stan_model(model_code = stancode)

# Fit model
fit <- sampling(stanmodel, data = standata, seed = seed, refresh = 0)

(loo1 <- loo(fit))
(loo_mm1 <- loo(fit, moment_match = TRUE))
(loo_mm2 <- loo(fit, moment_match = TRUE, cores = 6))

A probable reason for silently failing (instead of stopping on error) is the use of try() in function loo_moment_match_i() defined in file loo_moment_matching.R. That was added as Stan sometimes causing an exception. Now it is likely that a problem in your parallel setup is causing an error, but it gets hidden by try(). If you are brave, you could clone the repo, remove try(), load the modified package with devtools::load_all(...), and you would then see the errors. If you prefer, I can make a github branch with try() removed, and you can then install using remotes::install_github(...) Unfortunately, this can be tested only by someone who does get the parallelization errors.

Hi Aki,

You were right. I don’t know if it was out of bravery or stubbornness, but I forked the loo GitHub repository, removed all the try() statements in loo_moment_match_i(), and the problem showed up!

> (loo_mm2 <- loo(fit, moment_match = TRUE, cores = 6))
Error en checkForRemoteErrors(val): 
  6 nodes produced errors; first error: the model object is not created or not valid
11. stop(count, " nodes produced errors; first error: ", firstmsg,
         domain = NA)
10. checkForRemoteErrors(val)
9. staticClusterApply(cl, fun, length(x), argfun)
8. clusterApply(cl = cl, x = splitList(X, nchunks), fun = lapply,
                FUN = fun, ...)
7. do.call(c, clusterApply(cl = cl, x = splitList(X, nchunks), fun = lapply, 
           FUN = fun, ...), quote = TRUE)
6. parallel::parLapply(cl = cl, X = I, fun = function(i) loo_moment_match_i_fun(i)) at loo_moment_matching.R#136
5. loo::loo_moment_match.default(x = x, loo = loo, post_draws = post_draws_stanfit,
   log_lik_i = log_lik_i_stanfit, unconstrain_pars = unconstrain_pars_stanfit,
   log_prob_upars = log_prob_upars_stanfit, log_lik_i_upars = log_lik_i_upars_stanfit,
   ...)
4. loo_moment_match.stanfit(x, loo = loo, chain_id = chain_id, 
                            k_threshold = k_threshold, cores = cores, 
                            parameter_name = pars, ...)
3. .local(x, ...)
2. loo(fit, moment_match = TRUE, cores = 6)
1. loo(fit, moment_match = TRUE, cores = 6)

If I understand correctly, the error emerges from every PSOCK worker when loo_moment_match_i() calls log_prob_upars() or log_lik_i_upars(). The fitted model isn’t being properly exported to the workers, but I don’t know why…

I’ll update R from version 4.4.2. to 4.4.3 later to see if the issue magically disappears. Do you have any other ideas about what might be causing this?

Thanks for testing this. I should investigate how we could ignore Stan caused interrupts, but still show other errors like this parallelization error.

Does parallel::parLapply() work for you with the simple examples in ?parallel::parLapply?

Yes, all the examples work perfectly on my computer and give me the same results as those in the examples section of the parallel package.

Update: after upgrading R from 4.4.2 to 4.5, along with the other necessary changes such as Rtools 45, the issue still persists. The LOO computation is parallelized without any problem on my computer, but not the moment_match (although this was already happening with my previous version of R).

If anyone has any ideas on how to solve this strange bug, I’m happy to run any necessary tests!