Problem predicting with newdata from brms model with custom beta_binomial distribution


#1

Hello,

I fit a model with the custom beta binomial distribution from vignette(“brms_customfamilies”). I’m trying to get the fitted values from this model using a new data frame with argument “newdata.” However, I’m encountering an error whenever the “newdata” argument is supplied to either the fitted or predict functions. Both functions work fine predicting from the original data frame with no new data supplied. I’ve encountered the same error using the example from vignette(“brms_customfamilies”) so here is an example:

require(brms)
#> Loading required package: brms
#> Loading required package: Rcpp
#> Loading required package: ggplot2
#> Loading 'brms' package (version 2.6.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> Run theme_set(theme_default()) to use the default bayesplot theme.

data("cbpp", package = "lme4")

beta_binomial2 <- custom_family(
  "beta_binomial2", dpars = c("mu", "phi"),
  links = c("logit", "log"), lb = c(NA, 0),
  type = "int", vars = "trials[n]"
)

stan_funs <- "
  real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
}
int beta_binomial2_rng(real mu, real phi, int T) {
return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
}
"

stanvars <- stanvar(scode = stan_funs, block = "functions") +
  stanvar(as.integer(cbpp$size), name = "trials")

fit2 <- brm(
  incidence ~ period + (1|herd), data = cbpp, 
  family = beta_binomial2, stanvars = stanvars
)
#> Compiling the C++ model
#> Start sampling
#> 
#> SAMPLING FOR MODEL '5b724f4e4312a2f504f358414aeb3092' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.000142 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.42 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 2.73569 seconds (Warm-up)
#> Chain 1:                2.01796 seconds (Sampling)
#> Chain 1:                4.75364 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL '5b724f4e4312a2f504f358414aeb3092' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 0.000112 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.12 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 2.65138 seconds (Warm-up)
#> Chain 2:                2.42912 seconds (Sampling)
#> Chain 2:                5.0805 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL '5b724f4e4312a2f504f358414aeb3092' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 0.000136 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.36 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 2.54499 seconds (Warm-up)
#> Chain 3:                2.31139 seconds (Sampling)
#> Chain 3:                4.85638 seconds (Total)
#> Chain 3: 
#> 
#> SAMPLING FOR MODEL '5b724f4e4312a2f504f358414aeb3092' NOW (CHAIN 4).
#> Chain 4: 
#> Chain 4: Gradient evaluation took 0.000105 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.05 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4: 
#> Chain 4: 
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4: 
#> Chain 4:  Elapsed Time: 2.6117 seconds (Warm-up)
#> Chain 4:                2.1002 seconds (Sampling)
#> Chain 4:                4.7119 seconds (Total)
#> Chain 4:

expose_functions(fit2, vectorize = TRUE)

log_lik_beta_binomial2 <- function(i, draws) {
  mu <- draws$dpars$mu[, i]
  phi <- draws$dpars$phi
  N <- draws$data$trials[i]
  y <- draws$data$Y[i]
  beta_binomial2_lpmf(y, mu, phi, N)
}

predict_beta_binomial2 <- function(i, draws, ...) {
  mu <- draws$dpars$mu[, i]
  phi <- draws$dpars$phi
  N <- draws$data$trials[i]
  beta_binomial2_rng(mu, phi, N)
}

fitted_beta_binomial2 <- function(draws) {
  mu <- draws$dpars$mu
  trials <- draws$data$trials
  trials <- matrix(trials, nrow = nrow(mu), ncol = ncol(mu), byrow = TRUE)
  mu * trials
}

#Create newdata for predict or fitted
newdat<-expand.grid(incidence=unique(cbpp$incidence), size=unique(cbpp$size), period=unique(cbpp$period))

#errors
fit2prednew<-predict(fit2, newdata=newdat, re_formula = NA)
#> Error in (function (mu, phi, T, base_rng__ = <pointer: 0x7fef19b78cc0>, : Exception: beta_binomial_rng: Population size parameter is -2147483648, but must be >= 0!  (in 'unknown file name' at line 8)

fit2fitnew<-fitted(fit2, newdata=newdat, re_formula = NA)
#> Warning in matrix(trials, nrow = nrow(mu), ncol = ncol(mu), byrow = TRUE):
#> data length [56] is not a sub-multiple or multiple of the number of rows
#> [4000]

#Works fine
fit2pred<-predict(fit2, re_formula = NA)
fit2fit<-fitted(fit2, re_formula = NA)

Created on 2018-12-17 by the reprex package (v0.2.1)

I would appreciate any help. R and all of my packages are up to date.

  • Operating System: macOS Mojave
  • brms Version: 2.6.0

Thank you very much,
Sam


#2

When using stanvars, you have to pass the new stanvars data via argument new_objects.


#3

Thank you very much for your help. Unfortunately I still cannot get this to work. I thought you meant using the argument new_objects instead of newdata for the predict function but that does not work. It runs without error but results in predictions based on the original data so it seems that predict is ignoring the new data I am passing it through the new_objects argument.

How would I pass the new data frame to predict using the new_objects argument?

Thank you very much


#4

Nevermind, I think I found the problem. The variable “trials” had to be updated in stanvars and passed through new_objects while the rest of my data frame could be passed through newdata.

So the correct code is

fit2fitnew<-fitted(fit2, new_objects=list(trials = 16), newdata = newdat, re_formula = NA)

For some reason I cannot get it to work for predict but I’m mostly interested in using fitted so I didn’t play around with it much more. In case it helps someone else, I tried

fit2prednew<-predict(fit2, new_objects=list(trials = 16), newdata = newdat, re_formula = NA)

But got the error “Error in (function (mu, phi, T, base_rng__ = <pointer: 0x116da0570>, pstream__ = <pointer: 0x1165615d0>) :
Exception: beta_binomial_rng: Population size parameter is -2147483648, but must be >= 0! (in ‘unknown file name’ at line 8)”