Regularizing priors to help brms fit a multilevel negative binomial hurdle model with identity link

Tiaan · July 29, 2024, 8:23am

Hi all,

I am busy with a validation study of a field method used in ecology and have three variables: “identity”, “observed_count” and “proxy_count”. Identity is individual identity (n = 38), observed_count is the true value of a variable of interest, and proxy_count is an index of the observed_count. I want to know how reliably I can expect to predict the observed_count in future, when only proxy_count is available.

The counts are zero-inflated and there is a weak positive correlation between the observed and proxy counts (as I expected).

Using brms, I fitted a negative binomial hurdle model. Important: I expect the relationship between the response and predictor to be linear (especially in some of the other “proxy_counts” I have, which are better representations of the “observed_count” than the current example). Therefore, I specify an identity link, not a log link. This model fits reasonably well though it overpredicts zeros.

# Define weakly informative priors 
model_prior <- c(
  set_prior("normal(0, 1)", class = "Intercept"),              # Prior for mu_Intercept
  set_prior("beta(2, 2)", class = "Intercept", dpar = "hu", lb = 0, ub = 1), # Prior for hu_Intercept
  set_prior("normal(0, 1)", class = "b"),                      # Prior for regression coefficients
  set_prior("normal(0, 1)", class = "b", dpar = "hu"),         # Prior for regression coefficients
  set_prior("lognormal(1, 1)", class = "shape"))

mod_nb <- brm(bf(observed_count ~ proxy_count, hu ~ proxy_count), 
           data = dat,
           family = hurdle_negbinomial(link = "identity", link_shape = "log", link_hu = "logit"),
           prior = model_prior, sample_prior = T)

pred <- posterior_predict(mod_nb)
bayesplot::ppc_dens_overlay(y = log1p(dat$observed_count), yrep = log1p(pred[1:100,]))

Given that I want to estimate the predictive performance for new individuals, I’ve been thinking about ‘out-of-sample’ predictive performance - e.g., estimating leave-one-out predictive performance with loo_predictive_metric.

But before I get to this, I need to ideally include a random intercept and slope for every individual, to account for the structure in the data (multiple observations per individual; scatterplots show individuals tend to have different slopes). For example:

mixmod_nb <- brm(bf(observed_count ~ proxy_count + (1 + proxy_count | id), 
                              hu ~ proxy_count + (1 + proxy_count | id)), 
           data = dat,
           family = hurdle_negbinomial(link = "identity", link_shape = "log", link_hu = "logit"),
           prior = model_prior, sample_prior = T)

I can fit this model when specifying a log link function, but it doesn’t fit with an identity link (that is not constrained to be positive) (a zero-inflated model behaves the same). I wondered whether appropriate regularizing priors will allow this model to run? That is were I get stuck - I don’t (yet) have a good handle on setting proper priors (in general and with brms). I’d welcome suggestions for appropriate regularizing priors if you think this is a possible solution. Any general comments regarding the approach (am I even on the right track here? I sometimes have doubts) are also appreciated.

I include a csv file with the data (2471 observations) used here and code below if anyone is interested to have a look at it:
input.csv (26.6 KB)

library(brms)
library(tidyverse)
library(modelr)

dat = read.csv("input.csv")

hist(dat$observed_count, breaks = 20)

ggplot(data = dat, aes(proxy_count, observed_count)) + 
            geom_point(alpha = 0.2) + theme_bw()

# Fit hurdle_negbinomial brm model
# Define weakly informative priors 
model_prior_informative <- c(
  set_prior("normal(0, 1)", class = "Intercept"),              # Prior for mu_Intercept
  set_prior("beta(2, 2)", class = "Intercept", dpar = "hu", lb = 0, ub = 1), # Prior for hu_Intercept
  set_prior("normal(0, 1)", class = "b"),                      # Prior for regression coefficients
  set_prior("normal(0, 1)", class = "b", dpar = "hu"),         # Prior for regression coefficients
  set_prior("lognormal(1, 1)", class = "shape"))

mod_nb <- brm(bf(observed_count ~ proxy_count, hu ~ proxy_count), 
              data = dat,
              family = hurdle_negbinomial(link = "identity", link_shape = "log", link_hu = "logit"),
              prior = model_prior_informative,
              sample_prior = T)

mod_nb

pp_check(mod_nb)

# To highlight the zero process, we log the predicted values. 
# Look again - how well do the posterior draws fit the actual data? 
pred <- posterior_predict(mod_nb)
bayesplot::ppc_dens_overlay(y = log1p(dat$observed_count), yrep = log1p(pred[1:100,]))

# posterior predicitons
conditional_effects(mod_nb, method = "posterior_epred", prob = 0.9)
conditional_effects(mod_nb, method = "posterior_epred", prob = 0.9, dpar = "hu")

dat %>% 
  modelr::data_grid(proxy_count = seq_range(proxy_count, n = 20)) %>%
  tidybayes::add_predicted_draws(mod_nb) %>%
  ggplot(aes(x = proxy_count, y = observed_count)) +
  ggdist::stat_lineribbon(aes(y = .prediction), .width = c(.90, .80, .50), alpha = 0.25) +
  geom_point(data = dat, alpha = 0.2) +
  geom_segment(aes(x = 0, y = 0, xend = 14, yend = 14), 
               color = "red", linewidth = 1)  # 1:1 line

martinmodrak · July 29, 2024, 9:08am

Identity link is almost certain to cause problems unless all the model predictions are far from zero. There are however at least two ways to achieve something close to linear relationship between a predictor and the observed counts:

Use log of the predictor to predict the log of observed count
Use the softplus link and use untransformed predictor - this assumes the relationship is linear far from zero, but constraints the predictions to be positive.

Best of luck with your model.

Topic		Replies	Views
Setting prior on inverse square root of negative binomial's scale parameter brms	3	565	July 28, 2021
Priors for hurdle models in brms Modeling priors , brms	2	723	August 21, 2023
Brms prior speficiation in negative binomial model Modeling prior-choice , brms	2	801	November 30, 2022
Example of setting weakly informative priors in a hurdle brms model brms	2	1204	July 7, 2020
Zero-truncated count data brms fitting-issues	7	2612	March 7, 2022

Regularizing priors to help brms fit a multilevel negative binomial hurdle model with identity link

Related topics