Using loo(..., reloo = TRUE) with data passed by stanvar

AntonioV · January 8, 2019, 12:09am

I’m using brms 2.7.0 (from CRAN).

I’m defining a custom family and passing an extra column of data into the data{} block using stanvar(). When I try to compute loo using loo(fit, reloo = TRUE), the first reloo sampling fails because the bad observation is still in the data I passed in stanvar(). Is there something I can do so that the bad observations are also removed from that column for the reloo() calls?

Here’s a minimum working example. This is a binomial model, except that the number of trials in some rows are missing (set to 0). For these missing cases, it’s known that the number of trials is between y and y+9 inclusive, where y is the number of successes.

Since brms does not allow the argument in trials() to be 0, it seems like I should pass the trials (called cohort in the model) using stanvar().

graduates <- c(65, 53, 401, 413, 70, 71, 475, 512, 421, 474, 29, 20, 6203)
cohort    <- c(84, 75, 428, 437, 93, 94, 598, 622, 477, 520,  0,  0, 7076)

binomial_rsupp <- custom_family(
    name = "binomial_rsupp",
    dpars = "mu",
    links = "logit",
    type = "int",
    vars = "cohort[n]"
)

stan_funs <- "
    real binomial_rsupp_lpmf(int y, real mu, int T) {
        if (T) {
            return binomial_lpmf(y | T, mu);
        } else {
            vector[10] lp;
            for (j in 0:9)
                lp[j+1] = binomial_lpmf(y | y + j, mu);
            return log_sum_exp(lp);
        }
    }
"

stanvars <- stanvar(scode = stan_funs, block = "functions") +
                stanvar(cohort, name = "cohort", scode = "  int cohort[N];")

fit1 <- brm(
    graduates ~ 1,
    family = binomial_rsupp,
    stanvars = stanvars,
    data = list(graduates = graduates),
    iter = 4e3,
    warmup = 2e3
)

expose_functions(fit1, vectorize = TRUE)

log_lik_binomial_rsupp <- function(i, draws) {
    y <- draws$data$Y[i]
    mu <- draws$dpars$mu[,i]
    N <- draws$data$cohort[i]
    binomial_rsupp_lpmf(y, mu, N)
}

loo(fit1)

loo(fit1, reloo = TRUE)

This produces the following error:

Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=cohort; position=0; dims declared=(12); dims found=(13)

Incidentally, I think that if trials() allowed zeros, I could just use that and pass the data inside the brm() call directly instead of passing the extra data using stanvar(). Then it would be automatically properly subsetted for the reloo() call.

paul.buerkner · January 9, 2019, 1:22pm

Could you use a value other than 0 as an indicator for your special situation?

I don’t want to allow 0 in trials() just for this very special situation to be honest.

AntonioV · January 10, 2019, 7:16am

You’re right, I totally can. I didn’t think of it until you asked, but I can just use 1 instead. Thanks!

Topic		Replies	Views
Problem with loo function for model comparison General rstan , loo , model-comparison , brms	5	538	February 28, 2023
Calculate loo_R2 with Cox family in brms package in R brms loo	7	755	June 17, 2020
Error in ll_args.stanreg(x) : all(y %in% c(0, 1)) is not TRUE Modeling loo , rstanarm	1	304	July 28, 2023
Using BRMS Functionality for Data Generated in CmdStan brms	4	1343	September 5, 2019
Looking for examples of exact loo calculation and recombination with approximate loo for non-rstanarm stan models Modeling rstan , loo	1	485	March 14, 2021

Using loo(..., reloo = TRUE) with data passed by stanvar

Related topics