Using loo(..., reloo = TRUE) with data passed by stanvar

I’m using brms 2.7.0 (from CRAN).

I’m defining a custom family and passing an extra column of data into the data{} block using stanvar(). When I try to compute loo using loo(fit, reloo = TRUE), the first reloo sampling fails because the bad observation is still in the data I passed in stanvar(). Is there something I can do so that the bad observations are also removed from that column for the reloo() calls?

Here’s a minimum working example. This is a binomial model, except that the number of trials in some rows are missing (set to 0). For these missing cases, it’s known that the number of trials is between y and y+9 inclusive, where y is the number of successes.

Since brms does not allow the argument in trials() to be 0, it seems like I should pass the trials (called cohort in the model) using stanvar().

graduates <- c(65, 53, 401, 413, 70, 71, 475, 512, 421, 474, 29, 20, 6203)
cohort    <- c(84, 75, 428, 437, 93, 94, 598, 622, 477, 520,  0,  0, 7076)

binomial_rsupp <- custom_family(
    name = "binomial_rsupp",
    dpars = "mu",
    links = "logit",
    type = "int",
    vars = "cohort[n]"

stan_funs <- "
    real binomial_rsupp_lpmf(int y, real mu, int T) {
        if (T) {
            return binomial_lpmf(y | T, mu);
        } else {
            vector[10] lp;
            for (j in 0:9)
                lp[j+1] = binomial_lpmf(y | y + j, mu);
            return log_sum_exp(lp);

stanvars <- stanvar(scode = stan_funs, block = "functions") +
                stanvar(cohort, name = "cohort", scode = "  int cohort[N];")

fit1 <- brm(
    graduates ~ 1,
    family = binomial_rsupp,
    stanvars = stanvars,
    data = list(graduates = graduates),
    iter = 4e3,
    warmup = 2e3

expose_functions(fit1, vectorize = TRUE)

log_lik_binomial_rsupp <- function(i, draws) {
    y <- draws$data$Y[i]
    mu <- draws$dpars$mu[,i]
    N <- draws$data$cohort[i]
    binomial_rsupp_lpmf(y, mu, N)


loo(fit1, reloo = TRUE)

This produces the following error:

Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=cohort; position=0; dims declared=(12); dims found=(13)

Incidentally, I think that if trials() allowed zeros, I could just use that and pass the data inside the brm() call directly instead of passing the extra data using stanvar(). Then it would be automatically properly subsetted for the reloo() call.

Could you use a value other than 0 as an indicator for your special situation?

I don’t want to allow 0 in trials() just for this very special situation to be honest.

1 Like

You’re right, I totally can. I didn’t think of it until you asked, but I can just use 1 instead. Thanks!