No samples when using reloo on custom_family brmsfit

bmfazio · April 12, 2019, 3:11am

Operating System: Windows 10
brms Version: 2.7.0

I’m using brms to fit an endpoint-inflated binomial distribution. I’ve included the log_lik, predict and fitted functions to take advantage of the post-processing methods.

All of them seem to be working just fine, but if I get high Pareto k observations in loo, using reloo = TRUE results in the following error:

Fitting model 1 out of 1 (leaving out observation 497)
Start sampling
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

 Error: The model does not contain posterior samples.

Below is the code I’m using to define the model:

fit_eiBinomialSM <- function(formula_, df){
  eiBinomialSM <- custom_family(
    "eiBinomialSM", dpars = c("mu", "so", "sm"), # unrestricted mixture scores
    links = c("logit", "identity", "identity"),
    type = "int", vars = "trials[n]",
    log_lik = function(i, draws) {
      mu <- draws$dpars$mu[, i]
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      trials <- draws$data$trials[i]
      y <- draws$data$Y[i]
      eiBinomialSM_lpmf(y, mu, so, sm, trials)
    },
    predict = function(i, draws, ...) {
      mu <- draws$dpars$mu[, i]
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      trials <- draws$data$trials[i]
      eiBinomialSM_rng(mu, so, sm, trials)
    },
    fitted = function(draws) {
      mu <- draws$dpars$mu
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      p <- t(apply(cbind(matrix(c(so, sm), ncol = 2), 0), 1, function(x)(exp(x)/sum(exp(x)))))
      p[,2] + p[,3]*(1/(1+exp(-mu)))
    }
  )
  
  stan_funs <- "
  vector linkfunc(real so, real sm){
  return softmax([so, sm, 0]');
  }
  
  int[] ei(int y, int ntrial) {
  return {(1 - min({1, y})), (1 - min({1, ntrial - y}))};
  }
  
  real eiBinomialSM_lpmf(int y, real mu, real so, real sm, int trials) {
  int yom[2] = ei(y, trials);
  vector[3] pom = linkfunc(so, sm);
  
  return log(yom[1]*pom[1] + yom[2]*pom[2] + pom[3]*exp(binomial_logit_lpmf(y | trials, mu)));
  }
  
  int eiBinomialSM_rng(real mu, real so, real sm, int trials) {
  int which_component = categorical_rng(linkfunc(so, sm));
  
  if (which_component == 1) {
  return 0;
  }
  if (which_component == 2) {
  return trials;
  }
  
  return binomial_rng(trials, inv_logit(mu));
  }
  "
  
  stanvars <- stanvar(scode = stan_funs, block = "functions") +
    stanvar(as.integer(wf$bindenom), name = "trials")
  
  brm(formula = formula_,
      data = df,
      family = eiBinomialSM,
      stanvars = stanvars)
  }

bgoodri · April 12, 2019, 3:17am

You may have to specify init_r to some number less than 2 when you first call brm in order to get it to use small enough values when it starts leaving observations out. That and check whether those observations are coded correctly in the data because they seem to be having a large influence on the posterior distribution.

bmfazio · April 12, 2019, 4:19am

I just tried init = "0" and init_r = 0.1 but the exact same error pops up.

I checked the single observation getting dropped and there doesn’t seem to be anything special about it. I was able to refit the model by removing the high-k observations altogether and the posteriors don’t show any major change either.

Are there other steps I could take to help troubleshoot this issue? I can share the data (it’s a relatively small dataset) and anything else needed to reproduce it.

bmfazio · April 12, 2019, 9:57pm

Okay, I figured out the issue.

Following this tutorial, the number of trials for the full dataset is passed via stanvar:

stanvars <- stanvar(scode = stan_funs, block = "functions") +
  stanvar(as.integer(cbpp$size), name = "trials")

The issue seems to occur because loo tries to exclude the high-k observations but is still using the original trials data, which results in a mismatch between counts and trials, where some of the former are larger than the latter.

So I suppose the question now is how to allow loo and stanvar to work together to pass the appropriate subsets of data into the model.

bmfazio · April 25, 2019, 8:26pm

For anyone who finds this thread in the future, here is my fix:

tmp.fun <- brms:::reloo.loo
body(tmp.fun)[[12]][[4]][7:9] <- body(tmp.fun)[[12]][[4]][6:8]
body(tmp.fun)[[12]][[4]][[6]] <- substitute(
  if(!is.null(fit_j$stanvars$trials)){
    fit_j$stanvars$trials$sdata <- fit_j$stanvars$trials$sdata[-omitted]
    fit_j$stanvars$trials$scode <- "  int trials[639];"
  })
assignInNamespace("reloo.loo", value = tmp.fun, ns = "brms")

It’s specific to my model, but the general idea is that I modified reloo.loo so that it checks for my variable inside the fit’s stanvars and if present, drops the corresponding observation and modifies the corresponding section of the data block.

JimBob · August 7, 2019, 9:15pm

Hi @bmfazio, I’m having a similar problem with a cumulative probit model. Great that you found a solution for your code. I’m still a bit of a novice with R functions - are you able to indicate which parts of your new function are specific to your purposes and would need to be changed? Sorry if this is a question that should be obvious!

With:

assignInNamespace(“reloo.loo”, value = tmp.fun, ns = “brms”)

Are you specifying that when brms calls reloo, it simply uses your function instead of the normal brms reloo, or does the new function need to be called in some other way?

Any help would be greatly appreciated!

Topic		Replies	Views
Error in loo and waic brms loo	6	1166	January 29, 2020
Error using loo for the custom_family beta-binomial brms	10	672	January 27, 2021
LOO error with a brmsfit model brms loo	3	553	October 2, 2018
Error when using the brm() brms	6	1369	June 19, 2020
Error: the model does not contain samples General rstan , fitting-issues , brms	4	107	January 26, 2025

No samples when using reloo on custom_family brmsfit

Related topics