Prior predictive check for mixed zero-inflated and Poisson models

I am trying out zero-inflated Poisson models for count data from a bird survey in brms (see previous conversation here Zero-inflated poisson model for count data). I have combined 3 simulated data sets, each with different occupancy, Poisson distribution abundance and detect-ability to represent 3 species. I haven’t even looked at my data yet, although I know that the simulations are quite representative.
the fit of different models vary from fairly good to good, with nice learning between groups. There are a few divergences, but they are scattered “randomly” in the pair plots.
My problem is to see if varying the priors, as suggested by the get_priors function, could improve the fit of the various models. At the moment, I’m changing priors blindly, without understanding how they could improve or make the fit worse.
Perhaps a prior predictive check could point the way, but I couldn’t find any pointers for the mixed zero-inflated and Poisson that I’m modelling.
Can someone suggest how to do prior predictive check for count data?
example code
sp_formula = bf(count ~ 1 + (1 + sday|site) + (1 + site|species), zi ~ species)

zip_prior ← c(set_prior(“student_t(5, 0, 5)”, class = “Intercept”),
set_prior(“lognormal(0, 1)”, class = “Intercept”,dpar = “zi”),
set_prior(“normal(0,3)”, class = “b”, dpar = “zi”))
fit ←
brm(data = my_data,
family = zifamily,
formula = sp_formula,
prior = zip_prior,
iter = 2000, warmup = 1000, thin = 1, chains = 4, cores = 4,
# seed = 9,
refresh = 0,
sample_prior = FALSE
)

Operating System: Linux Ubuntu 22.04
Interface Version: brms on RStudio
Compiler/Toolkit:

1 Like

Edit: I did not notice this on first reply:

The idea of prior predictive checks is to have a way to view, on the outcome space, the implications of using particular priors (i.e. sampling from priors without the likelihood). What you do want to do is use priors that make sense from a scientific standpoint. Since that is often difficult/impossible to think about in terms of individual parameters, you want to think about it in terms of the outcome space (hence, prior predictive checks). What you don’t want to do is to condition your priors on your data in terms of fit because then you are over-conditioning so to speak, i.e. conditioning on the data in both the prior and the likelihood. In other words, don’t tune your priors to match data that you’re looking at to gain a better fit (see example in section 1.4 in Entropy | Free Full-Text | The Prior Can Often Only Be Understood in the Context of the Likelihood of choosing a prior based on results from a previous fit). Jonah Gabry has a nice paper that includes prior predictive checks in part 3 Visualization in Bayesian Workflow | Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic.

You can use the same graphics as for posterior predictive checks, but use them on a model that is run using sample_prior="only" in the brm call.

I like to use these (if too many counts, then types like “bars” and “rootogram” may not ‘work’ visually):

#proportion of zeroes 
prop_zero <- function(colony_count) mean(colony_count == 0)
(prop_zero_testpo <- pp_check(model, type = "stat", stat = "prop_zero"))

#max value
pp_max <- function(colony_count) max(colony_count)
(pp_max_testnb <- pp_check(model, type = "stat", stat = "pp_max"))

#Note that for the above, you can also view by groups by using "stat_grouped" and group="species"

#histograms. change the binwidth to something that makes sense, but for low counts it is often 1
pp_check(model, type="hist", binwidth=1)

#bars or bars_grouped
pp_check(model, type="bars")

#rootogram
pp_check(model, type="rootogram")
1 Like

Many thanks. That will be a great help.

1 Like