Loo moment match/reloo crashes on brms fits after combine_models, part 2

I recently posted about a problem using combine_models with moment_match when running chains separately, combining them using combine_models, and running loo with moment_match. However, I’m running into a different problem when the chains are run on a separate computer after the fix from that issue, even with the recompile = TRUE option.

Based on the error message, it seems loo is grabbing the file path to the original model fit, which points to the location of the file on a different computer, resulting in a crash. The example code I provide below uses cmdstan as a backend, but the issue occurs regardless of whether the cmdstan backed is used. Since I understand it is a pain in the neck to find another computer to run the files, I’ve temporarily provided fits here. The folder contains four model fits, each representing one chain.

#each model is fit in parallel, with which_chain 
# being incremented for each chain fit

brm_fit <- brm(
  count ~ zAge + zBase * Trt + (1|patient),
  data = epilepsy, 
  family = poisson(),
  chains = 1, 
  cores = 1,
  seed = 022624 + which_chain,
  save_pars = save_pars(all=TRUE),
  backend = "cmdstanr",
  threads = threading(4),
  file = file.path(savdir, paste0("brm_fit_c_",which_model))
)

Then models are combined and loo with moment_match is run.

library(brms)
library(loo)


brm_c1 <- readRDS("/brm_fit_c_1.rds")
brm_c2 <- readRDS("/brm_fit_c_2.rds")
brm_c3 <- readRDS("/brm_fit_c_3.rds")
brm_c4 <- readRDS("/brm_fit_c_4.rds")

brm_fit <- combine_models(
  brm_c1,
  brm_c2,
  brm_c3,
  brm_c4
)

brm_fit <- add_criterion(
  brm_fit,
  "loo",
  moment_match = TRUE,
  recompile = TRUE,
  cores = 24
)

Here is the error message:

Recompiling the model with 'rstan'
Recompilation done
Automatically saving the model object in '/work/c/clayson/mmre_crash/brm_fit_c_1.rds'
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning messages:
1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
2: Found 1 observations with a pareto_k > 0.7 in model 'brm_fit'. It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 1 times to compute the ELPDs for the problematic observations directly. 
3: In gzfile(file, mode) :
  cannot open compressed file '/work/c/clayson/mmre_crash/brm_fit_c_1.rds', probable reason 'No such file or directory'

These paths /work/c/clayson/mmre_crash/ point to my user directory on a cluster, not to paths on my local machine. This is where I think the issue is, but I could be wrong. Is there an input I’m missing that needs changing?

Thanks for any help!
Peter

  • Operating System: OS 14.3.1
  • brms Version: 2.20.4

Ping @paul.buerkner . Seems to be specifically brms model file handling issue, but probably can be fixed quickly

Hmm. brms does not control (I think?) where to search the model files. It simply stores the meta attributes from the first (Cmdstanr) model. Not sure what else I would better do from the brms side. Ideas?

I reverified that I get the issue without using cmdstan.

That pointed me in the right direction. It was just me being a fool :)

Since file = NULL was used for add_criterion, add_criterion was trying to reuse the original filepath to save the model with the loo criteria. I didn’t realize the model would try to save regardless of specifying file. If I specify the file input, loo works fine.

Sorry for the confusion!

Peter

1 Like