Using multiprocessing for kfold with custom families

  • Operating System: macOS 10.14.5
  • brms Version: 2.9.0

When using a custom family to run kfold with plan(multiprocessing), it can’t seem to find log_lik functions that are in the global environment. I could only get it to work with my custom family if I turned the log_lik into a single function and passed it in the log_lik flag when creating the custom family. On the other hand, if I ran it normally (with plan(sequential)), this wasn’t a problem so seems to be an issue of passing environment variables to the processes in future.

Here is a reproducible example from only code from the vignettes (just to make sure it wasn’t an issue with my code). I made one small edit so that size was passed in as part of the regression rather than as a stanvar. (Without this edit, I ran into the same issue discussed here: No samples when using reloo on custom_family brmsfit . Is there now a more general solution than the one proposed there by any chance?)

data("cbpp", package = "lme4")

log_lik_beta_binomial2 <- function(i, draws) {
  mu <- draws$dpars$mu[, i]
  phi <- draws$dpars$phi
  N <- draws$data$trials[i]
  y <- draws$data$Y[i]
  beta_binomial2_lpmf(y, mu, phi, N)

beta_binomial2 <- custom_family(
  "beta_binomial2", dpars = c("mu", "phi"),
  links = c("logit", "log"), lb = c(NA, 0),
  type = "int", vars = "trials[n]", 
  log_lik = log_lik_beta_binomial2

stan_funs <- "
  real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
    return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
  int beta_binomial2_rng(real mu, real phi, int T) {
    return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);

stanvars <- stanvar(scode = stan_funs, block = "functions")

fit2 <- brm(
  incidence | trials(size) ~ period + (1|herd), data = cbpp, 
  iter = 200,
  family = beta_binomial2, stanvars = stanvars
expose_functions(fit2, vectorize = TRUE)


kfold(fit2, chains = 1)


I don’t see a fix for the multiprocess issue right now, as multiprocess seems to use separate enviroments that do not have the global enviroment of the main R process as a parent environment.
But it seems you already found a solution via the `log_lik´ argument.

Passing stuff via stanvar for newdata is a little bit tedious but you may use the new_objects argument for this purpose.

Thanks for the quick response! Re: new_objects, I’m not sure I totally understand where that would go in the call to kfold. Would I also have to edit the kfold function somehow like @bmfazio did in their solution?

You are right, new_objects may not be helpful for kfold as it is not appropriately subsetted inside kfold. Generally, subsetting new_objects automatically is more or less impossible as brms does not know what is passed there. Basically, for use in kfold, I would recommend passing all data via data and not using stanvars if possible.

Yeah, I’ve been trying to figure out how to do that exactly. What I need basically is something that reads in another (set of) variable(s) of length n that get passed to the custom lpmf family. Is there any functionality for including custom additional response information (like trials()/weights etc)?

Not that I am aware of. What we could do is implemented an addition argument that takes in vectors of values without checking them, which could be used inside a custom family. For instance

y | real(z) ~ x

where z is a (real) addition variables to be used in the custom family. Would something like that solve your problem?

Yes, that would be perfect!

