Hello, I’m having a problem making prediction for a 2-variable interaction model in which one of the combinations is not present in the data.
This is the data:
structure(list(Year = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("2019",
"2020"), class = "factor"), Period = structure(c(1L, 2L, 3L,
4L, 1L, 3L, 4L), .Label = c("Jan - Mar 10", "Mar 11 - May 4",
"May 5 - Aug", "Sep - Dec"), class = "factor"), Tot = c(27L,
26L, 37L, 37L, 30L, 37L, 82L), Cases = c(3L, 6L, 5L, 6L, 4L,
11L, 18L)), class = "data.frame", row.names = c(NA, -7L))
and the model call:
> model.fixef.test <- stan_glm(
Cases ~ Year * Period,
family = poisson(),
data = train_data.simp %>% as.data.frame(),
iter = 1000, cores = 8, chains = 2,
prior = student_t(df = 1, scale = 2.5, autoscale = T),
show_messages = F
)
I get the following warning:
In center_x(x, sparse) :
Dropped empty interaction levels: Year2020:PeriodMar 11 - May 4
(I also get other warnings because I put few samples and chains just to show the problem; with 10000 samples on 4 chains no other warnings appear)
Finally, when I try to produce prediction with posterior_epreds it fails:
> Preds <- posterior_epred(model.fixef.test)
Error in stanmat[, beta_sel, drop = FALSE] : subscript out of bounds
I tried many things, like converting the variables to characters but nothing
Eventually I solved by specifying an interaction variable with interaction(Year, Period)
since I was interested only in the predictions and not in the parameter estimates, but it’s a hacky solution.