Rstanarm: stan_gamm4() Poisson model does not find offset variable in data frame

Trying to fit a Poisson model with an offset using stan_gamm4() from the rstanarm:: package, I discovered that the offset variable (duration in the toy data set below) must be present as a vector in the R environment. stan_gamm4(). It need not be present in the data frame used to fit the model and it cannot be pulled from the data frame. In contrast, posterior_linpred() requires the offset variable to be present in the input data frame.

Took me some time to discover this and make stan_gamm4() estimate the Poison model with the offset. Would be nice if the offset variable can be read from the data frame, just like gamm4() does.

The R code below reproduces the problem and solution, at least with my version of R (3.5.1) and rstanarm (2.18.2) on a Windows PC.

##########################
## Poisson example GAMM...
##########################
## simulate data...
x <- runif(500)
fac <- sample(1:20,500,replace=TRUE)
eta <- x^2*3 + fac/20; fac <- as.factor(fac)
y <- rpois(500,exp(eta))
duration <- sample(1:4,500,replace=TRUE) #offset
data <- data.frame(x, fac, eta, y, duration)
rm(x, fac, eta, y, duration)

## fit model with gamm4() and examine it...
library(gamm4)
library(mgcv)

bp <- gamm4(
  y~s(x) + offset(log(duration)),
  family=poisson,
  random=~(1|fac), 
  data = data)
plot(bp$gam)
#gamm4 finds the offset variable in the data frame.

## fit model with stan_gamm4() and examine it...
library(rstanarm)
#Set options for multicore CPU with excess RAM.
options(mc.cores = parallel::detectCores())

bps <- stan_gamm4(
  y~s(x) + offset(log(duration)), 
  family=poisson(link = "log"),
  random=~(1|fac),
  data = data
  )
#Command above does not work: "Error in offset(log(duration)) : object
# 'duration' not found"

bps <- stan_gamm4(
  y~s(x) + offset(log(data$duration)), 
  family=poisson(link = "log"),
  random=~(1|fac),
  data = data
)
#Command above does not work. "Error in eval(inp, data, parent.frame()) : object
# 'duration' not found"

#With the offset as separate data object.
duration2 <- data$duration

bps <- stan_gamm4(
  y~s(x) + offset(log(duration2)), 
  family=poisson(link = "log"),
  random=~(1|fac),
  data = data
)
#Works.
plot_nonlinear(bps)

#Linear predictor works if the offset variable is part of the new data set but
# the offset variable does not have to be present as a separate data object.

nd <- data.frame(
  x = seq(-3, 3))
pred <- posterior_linpred(
  bps,
  newdata = nd,
  transform = TRUE,
  re.form = NA,
  seed = 445,
  offset = 1
)
# "Error in model.frame.default(ff, data = newdata, na.action = na.act) : 
# variable lengths differ (found for 'x')
# In addition: Warning messages:
#   1: In predict.gam(object, newdata, type = type, se.fit = se.fit, terms = terms,  :
#                       not all required variables have been supplied in  newdata!
#   2: 'newdata' had 7 rows but variables found have 500 rows "

nd <- data.frame(
  x = seq(-3, 3),
  duration2 = 1)
pred <- posterior_linpred(
  bps,
  newdata = nd,
  transform = TRUE,
  re.form = NA,
  seed = 445,
  offset = 1
)
#Works.

rm(duration2)
pred <- posterior_linpred(
  bps,
  newdata = nd,
  transform = TRUE,
  re.form = NA,
  seed = 445,
  offset = 1
)
#Works.

Thanks. The stan_gamm4 function nowdays uses mgcv::jagam to process the data, although the arguments and functionality are still the same as gamm4::gamm4. Does the offset thing work with mgcv::jagam?

Thanks for the quick response.

Both ways of specifying the offset work for mgcv::jagam() (see below), that is, jagam produces a model file without error message.

BTW according to the jagam vignette, the offset is ignored when predicting, unlike an offset included in formula . Does this also apply to the posterior_predict() and posterior_linpred() functions in stanarm?

jagam(
  y~s(x) + offset(log(duration)),
  family=poisson,
  file = "bp_jagam1",
  data = data)

jagam(
  y~s(x),
  family=poisson,
  offset=log(duration),
  file = "bp_jagam2",
  data = data)

I am pretty sure that if you can get the offset to work, then functions in rstanarm such as posterior_predict and posterior_linpred will include the offset, but obviously we haven’t tested everything that can be done through rstanarm via mgcv.