Trying to fit a Poisson model with an offset using stan_gamm4()
from the rstanarm::
package, I discovered that the offset variable (duration in the toy data set below) must be present as a vector in the R environment. stan_gamm4()
. It need not be present in the data frame used to fit the model and it cannot be pulled from the data frame. In contrast, posterior_linpred()
requires the offset variable to be present in the input data frame.
Took me some time to discover this and make stan_gamm4()
estimate the Poison model with the offset. Would be nice if the offset variable can be read from the data frame, just like gamm4()
does.
The R code below reproduces the problem and solution, at least with my version of R (3.5.1) and rstanarm (2.18.2) on a Windows PC.
##########################
## Poisson example GAMM...
##########################
## simulate data...
x <- runif(500)
fac <- sample(1:20,500,replace=TRUE)
eta <- x^2*3 + fac/20; fac <- as.factor(fac)
y <- rpois(500,exp(eta))
duration <- sample(1:4,500,replace=TRUE) #offset
data <- data.frame(x, fac, eta, y, duration)
rm(x, fac, eta, y, duration)
## fit model with gamm4() and examine it...
library(gamm4)
library(mgcv)
bp <- gamm4(
y~s(x) + offset(log(duration)),
family=poisson,
random=~(1|fac),
data = data)
plot(bp$gam)
#gamm4 finds the offset variable in the data frame.
## fit model with stan_gamm4() and examine it...
library(rstanarm)
#Set options for multicore CPU with excess RAM.
options(mc.cores = parallel::detectCores())
bps <- stan_gamm4(
y~s(x) + offset(log(duration)),
family=poisson(link = "log"),
random=~(1|fac),
data = data
)
#Command above does not work: "Error in offset(log(duration)) : object
# 'duration' not found"
bps <- stan_gamm4(
y~s(x) + offset(log(data$duration)),
family=poisson(link = "log"),
random=~(1|fac),
data = data
)
#Command above does not work. "Error in eval(inp, data, parent.frame()) : object
# 'duration' not found"
#With the offset as separate data object.
duration2 <- data$duration
bps <- stan_gamm4(
y~s(x) + offset(log(duration2)),
family=poisson(link = "log"),
random=~(1|fac),
data = data
)
#Works.
plot_nonlinear(bps)
#Linear predictor works if the offset variable is part of the new data set but
# the offset variable does not have to be present as a separate data object.
nd <- data.frame(
x = seq(-3, 3))
pred <- posterior_linpred(
bps,
newdata = nd,
transform = TRUE,
re.form = NA,
seed = 445,
offset = 1
)
# "Error in model.frame.default(ff, data = newdata, na.action = na.act) :
# variable lengths differ (found for 'x')
# In addition: Warning messages:
# 1: In predict.gam(object, newdata, type = type, se.fit = se.fit, terms = terms, :
# not all required variables have been supplied in newdata!
# 2: 'newdata' had 7 rows but variables found have 500 rows "
nd <- data.frame(
x = seq(-3, 3),
duration2 = 1)
pred <- posterior_linpred(
bps,
newdata = nd,
transform = TRUE,
re.form = NA,
seed = 445,
offset = 1
)
#Works.
rm(duration2)
pred <- posterior_linpred(
bps,
newdata = nd,
transform = TRUE,
re.form = NA,
seed = 445,
offset = 1
)
#Works.