Priors in a ZINB model

aczeq · March 17, 2025, 11:34am

Hello! I don’t have much experience with ZINB models and am unsure about prior selection for them. My dataset consists mostly of categorical predictors, except for Year (2015–2023), which is centered on 2019. The data is highly zero-inflated, with 86% of observations being zeros.

For the zero-inflation (zi) priors, I followed this guide, setting the share of zeros in the data as the mean for the zi intercept and the mean count of non-zeros as the mean for the main intercept. However, after performing a prior predictive check (for the proportion of zeros) using this method, I found that adjusting the zi mean to the proportion of zeros from IDs that always have zeros—and the intercept mean to the mean count of IDs with at least one nonzero—improved the check.

While this adjustment helped estimate the proportion of zeros accurately, the pp_check plots still showed implausibly large x-axis values. The dataset has a maximum Applications value of 20, yet the plots sometimes showed values in the tens of thousands. I experimented with the standard deviation and shape parameter, which yielded reasonable pp_check results for the three-way interaction model. However, when adding time as a factor, the issue reappeared (although not as extreme, only in the hundreds).

I find the shape parameter particularly challenging to understand and specify correctly. Unfortunately, I cannot share the data as it is confidential. Any insights or suggestions would be greatly appreciated!

Here is the first, 3-way model:

contrasts(data$Grade) <- contr.sum(length(levels(data$Grade)))
contrasts(data$School) <- contr.sum(length(levels(data$School)))

model_app_gps <- brm(
    formula = bf(
        Applications ~ Gender * Grade * School + offset(log(contract_length)),
        zi ~ Gender + Grade + School + log(contract_length)),
    family = zero_inflated_negbinomial(),
    data = data,
    sample_prior = "only",
    prior = c(
    prior(normal(log(0.78), 0.1), class = "Intercept"),              
    prior(normal(0, 0.1), class = "b"),              
    prior(normal(logit(0.64), 0.1), class = "Intercept", dpar = "zi"), 
    prior(normal(0, 0.1), class = "b", dpar = "zi"),  
    prior(gamma(1, 0.1), class = shape)),
    chains = 4, 
    iter = 2000, 
    seed = 123, 
    cores = 4)

And the 4-way model:

contrasts(data$Grade) <- contr.sum(length(levels(data$Grade)))
contrasts(data$School) <- contr.sum(length(levels(data$School)))

model_app_gpst <- brm(
    formula = bf(
        Applications ~ Gender * Grade * School * Year_of_app_centered + offset(log(contract_length)),
        zi ~ Gender + Grade + School + log(contract_length)),
    family = zero_inflated_negbinomial(),
    data = data,
    sample_prior = "only",
    prior = c(
    prior(normal(log(0.78), 0.1), class = "Intercept"), 
    prior(normal(0, 0.1), class = "b"),              
    prior(normal(logit(0.64), 0.1), class = "Intercept", dpar = "zi"), 
    prior(normal(0, 0.1), class = "b", dpar = "zi"),  
    prior(gamma(1, 0.1), class = shape)),
    chains = 4, 
    iter = 2000, 
    seed = 123, 
    cores = 4)

Based on the above these are the plots from ppcheck, in order:

Topic		Replies	Views
Help with ZINB model Modeling techniques	6	1109	February 12, 2022
ZINB model convergence issues when predicting shape parameter brms	7	1071	January 26, 2020
Zero-inflated Bernoulli brms	3	257	June 6, 2024
Prior predictive check for mixed zero-inflated and Poisson models Interfaces brms	2	788	September 14, 2023
Predicting from a zero inflated negative binomial model Modeling specification	2	847	May 23, 2023

Priors in a ZINB model

Related topics