No samples when using reloo on custom_family brmsfit

#1
  • Operating System: Windows 10
  • brms Version: 2.7.0

I’m using brms to fit an endpoint-inflated binomial distribution. I’ve included the log_lik, predict and fitted functions to take advantage of the post-processing methods.

All of them seem to be working just fine, but if I get high Pareto k observations in loo, using reloo = TRUE results in the following error:

Fitting model 1 out of 1 (leaving out observation 497)
Start sampling
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

 Error: The model does not contain posterior samples. 

Below is the code I’m using to define the model:

fit_eiBinomialSM <- function(formula_, df){
  eiBinomialSM <- custom_family(
    "eiBinomialSM", dpars = c("mu", "so", "sm"), # unrestricted mixture scores
    links = c("logit", "identity", "identity"),
    type = "int", vars = "trials[n]",
    log_lik = function(i, draws) {
      mu <- draws$dpars$mu[, i]
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      trials <- draws$data$trials[i]
      y <- draws$data$Y[i]
      eiBinomialSM_lpmf(y, mu, so, sm, trials)
    },
    predict = function(i, draws, ...) {
      mu <- draws$dpars$mu[, i]
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      trials <- draws$data$trials[i]
      eiBinomialSM_rng(mu, so, sm, trials)
    },
    fitted = function(draws) {
      mu <- draws$dpars$mu
      so <- draws$dpars$so
      sm <- draws$dpars$sm
      p <- t(apply(cbind(matrix(c(so, sm), ncol = 2), 0), 1, function(x)(exp(x)/sum(exp(x)))))
      p[,2] + p[,3]*(1/(1+exp(-mu)))
    }
  )
  
  stan_funs <- "
  vector linkfunc(real so, real sm){
  return softmax([so, sm, 0]');
  }
  
  int[] ei(int y, int ntrial) {
  return {(1 - min({1, y})), (1 - min({1, ntrial - y}))};
  }
  
  real eiBinomialSM_lpmf(int y, real mu, real so, real sm, int trials) {
  int yom[2] = ei(y, trials);
  vector[3] pom = linkfunc(so, sm);
  
  return log(yom[1]*pom[1] + yom[2]*pom[2] + pom[3]*exp(binomial_logit_lpmf(y | trials, mu)));
  }
  
  int eiBinomialSM_rng(real mu, real so, real sm, int trials) {
  int which_component = categorical_rng(linkfunc(so, sm));
  
  if (which_component == 1) {
  return 0;
  }
  if (which_component == 2) {
  return trials;
  }
  
  return binomial_rng(trials, inv_logit(mu));
  }
  "
  
  stanvars <- stanvar(scode = stan_funs, block = "functions") +
    stanvar(as.integer(wf$bindenom), name = "trials")
  
  brm(formula = formula_,
      data = df,
      family = eiBinomialSM,
      stanvars = stanvars)
  }
0 Likes

#2

You may have to specify init_r to some number less than 2 when you first call brm in order to get it to use small enough values when it starts leaving observations out. That and check whether those observations are coded correctly in the data because they seem to be having a large influence on the posterior distribution.

0 Likes

#3

I just tried init = "0" and init_r = 0.1 but the exact same error pops up.

I checked the single observation getting dropped and there doesn’t seem to be anything special about it. I was able to refit the model by removing the high-k observations altogether and the posteriors don’t show any major change either.

Are there other steps I could take to help troubleshoot this issue? I can share the data (it’s a relatively small dataset) and anything else needed to reproduce it.

0 Likes

#4

Okay, I figured out the issue.

Following this tutorial, the number of trials for the full dataset is passed via stanvar:

stanvars <- stanvar(scode = stan_funs, block = "functions") +
  stanvar(as.integer(cbpp$size), name = "trials")

The issue seems to occur because loo tries to exclude the high-k observations but is still using the original trials data, which results in a mismatch between counts and trials, where some of the former are larger than the latter.

So I suppose the question now is how to allow loo and stanvar to work together to pass the appropriate subsets of data into the model.

0 Likes