Issue using posterior_vs_prior with Poisson or Neg binomial models

Please also provide the following information in addition to your question:

  • Operating System: macOS Mojave 10.14.6
  • rstanarm Version: 2.18.2

First, thank you to the developers of rstanarm, it’s such a user-friendly way to learn Bayesian statistics.

I have been unable to use the posterior_vs_predict function for rstanarm Poisson or Negative binomial models. I have read through the R documentation and googled but I still don’t understand why I can’t get it to work. In a recent statistics class, the teacher (on Linux, I believe same version of rstanarm) also encountered this problem. Is there something different I must call within this function for Poisson or Neg binomial models? For Gaussian or Binomial rstanarm models I normally use the below code with no issues:

posterior_vs_prior(rstanarm.model, pars = c(‘parameters specified by name’),
color_by = ‘vs’, group_by = TRUE, facet_args = list(scales = ‘free_y’))

Even keeping it simple, eg. posterior_vs_prior(rstanarm.model), this won’t work with a Poisson or Neg binomial model.

Thanks in advance!

Welcome to the forum!

Could you give more detail on the error that you get? I tried a minimal example with simulated data and that seemed to work.

x <- rnorm(100, 0, 1)
y <- rpois(100, exp(x))
d <- data.frame(x = x, y = y)
stp <- stan_glm(y ~ x, data = d, family = poisson)
posterior_vs_prior(stp)
2 Likes

Thanks for your response. You reminded me of the importance of going back to the basics to locate the issue!

Keeping with your minimal example, it’s only when changing both the iterations and warmup from the default that the plot throws an error.

stp ← stan_glm(y ~ x, data = d, family = poisson, iter = 5000, warmup = 2000)
posterior_vs_prior(stp)
Error in FUN(newX[, i], . . . ) : is.atomic(x) is not TRUE

I don’t really understand the error, except x is not an atomic vector. Isn’t a numeric vector (which x is currently classified as) a type of atomic vector?

I believe it may be something to do with the default warmup to iteration ratio for Poisson and Negative binomial models being set at 50%. I assume this is in place as a safe guard.

I decreased the warmup (to 40%) in my models since validation checks still looked good and I thought it’s best to make use of as many samples as possible. Is this unnecessary?

Thanks in advance.

I think this is a bug. I get the same error with

stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 4001, warmup = 2000)
stp <- stan_glm(y ~ x, data = d, family = gaussian, iter = 4001, warmup = 2000)

but not with

stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 3001, warmup = 1000)
stp <- stan_glm(y ~ x, data = d, family = gaussian, iter = 3001, warmup = 1000)
stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 4000, warmup = 1000)
stp <- stan_glm(y ~ x, data = d, family = gaussian, iter = 4000, warmup = 1000)

@bgoodri @jonah I can report this on github if you think it’s a bug. I followed traceback and I think it happens in or shortly after this line

EBFMI <- get_num_upars(object)/apply(E, 2, var)  

in throw_sample_warnings when sampling from the prior (update.stanreg)

1 Like

It’s a bug. I reported it here

1 Like

Thank you!
I really like the posterior_vs_prior function as it’s a quick and easy way to check my priors makes sense, so I’m looking forward to a solution.

If this is representative of your specific example

stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 5000, warmup = 2000)

you can do

stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 5000)
posterior_vs_prior(stp)

You have 500 more warmup and 500 less posterior samples per chain but that should still work.

The following will also work but sampling from the prior will have a shorter warmup (which is probably fine for a regular regression).

stp <- stan_glm(y ~ x, data = d, family = poisson, iter = 5000, warmup = iter / 2.5)
posterior_vs_prior(stp)
1 Like

Thanks, yea I have been doing the former example and it’s fine. It would only provide more flexibility if I can change the warmup and still use this function in the future.
Maybe it’s time I learn rstan/stan.

1 Like

I should be revising my paper but couldn’t stop thinking about this. This is also more for others who might hit this problem. If you load bayesplot first and then use this slightly hacked together version of the original function, I think you can keep using it even when you specify the warmup = 2000 in the original stan_glm call.


require(bayesplot)
posterior_vs_prior_stanreg <- function (object, pars = NULL, regex_pars = NULL, prob = 0.9, 
          color_by = c("parameter", "vs", "none"), group_by_parameter = FALSE, 
          facet_args = list(), ...) 
{
  if (!rstanarm:::used.sampling(object)) 
    rstanarm:::STOP_sampling_only("posterior_vs_prior")
  stopifnot(isTRUE(prob > 0 && prob < 1))
  color_by <- switch(match.arg(color_by), parameter = "parameter", 
                     vs = "model", none = NA)
  if (group_by_parameter) {
    group_by <- "parameter"
    xvar <- "model"
  }
  else {
    group_by <- "model"
    xvar <- "parameter"
  }
  aes_args <- list(x = xvar, y = "estimate", ymin = "lb", ymax = "ub")
  if (!is.na(color_by)) 
    aes_args$color <- color_by
  if (!length(facet_args)) {
    facet_args <- list(facets = group_by)
  }
  else {
    facet_args$facets <- group_by
  }
  message("\nDrawing from prior...")
  capture.output(Prior <- suppressWarnings(update(object, prior_PD = TRUE, 
                                                  refresh = -1, iter = 2000,
                                                  warmup = 1000, chains = 2)))
  objects <- rstanarm:::nlist(Prior, Posterior = object)
  plot_data <- rstanarm:::stack_estimates(objects, prob = prob, pars = pars, 
                               regex_pars = regex_pars)
  graph <- ggplot(plot_data, mapping = do.call("aes_string", 
                                               aes_args)) + geom_pointrange(...) + do.call("facet_wrap", 
                                                                                           facet_args) + theme_default() + xaxis_title(FALSE) + 
    yaxis_title(FALSE) + xaxis_ticks() + xaxis_text(angle = -30, 
                                                    hjust = 0) + grid_lines(color = "gray", size = 0.1)
  if (group_by == "parameter") 
    return(graph)
  abbrevs <- abbreviate(plot_data$parameter, 12, method = "both.sides", 
                        dot = TRUE)
  graph + scale_x_discrete(name = "Parameter", labels = abbrevs)
}

1 Like

Thanks for reporting this and for attempting to troubleshoot! I haven’t had a chance to look seriously into it yet but hopefully will in the coming days.

There’s a PR on the rstanarm github repo about this.

Awesome, thanks!! Ben and I shoild be getting back to rstanarm in the next few days.

1 Like