The conditional_effects function of the brms package with a hurdle poisson model

I am new to Bayesian modeling so I may be making a simple mistake.

I have count data with lots of zeros, so I decided to use a hurdle poisson model.

I ran the following code:

brm(count_long ~ img_num*TQ_mean,
      data = mydata, family = hurdle_poisson, iter = 2000)

The img_num is a dummy variable and can be 0, 1, or 2. The TQ_mean is a continuous variable. I am interested in the interaction between them.

I could successfully the following results:

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.99 0.03 1.93 2.05 1.00 1525 1906
img_num1 0.05 0.05 -0.05 0.14 1.00 1557 2055
img_num2 0.19 0.04 0.11 0.27 1.00 1420 2030
TQ_mean 0.05 0.02 0.02 0.09 1.00 1496 2378
img_num1:TQ_mean -0.13 0.03 -0.19 -0.07 1.00 1579 2171
img_num2:TQ_mean -0.16 0.02 -0.20 -0.11 1.00 1353 2173

For simplicity, let me explain about the condition in which img_num = 0.

The conditional_effects function showed me the picture in which the value of count_long (y-axis) is near 2 at the point where TQ_mean is 1. (It would be easier to understand if I could attach the image file)

I don’t think this figure is correct because as to img_num = 0 the value of count_long should be:

exp(1.99 + 0.05) = 7.69 (But it is about 2 in the figure)

I also tried to run the following code. Note that I used a poisson model instead of a hurdle poisson model.

tmpdata %>%
  filter(count_long > 0) %>% 
  brm(count_long ~ img_num*TQ_mean,
      data = ., family = poisson, iter = 2000)

I obtained the same coefficients as in the hurdle model, but the the conditional_effects function showed me a figure which have a different Y scale. In this figure, the value of count_long (y-axis) is near 8 at the point where TQ_mean is 1. This figure is understandable.

My question is that the figure using the hurdle poisson model is correct. If it is right, what do I misunderstand?

Operating System: Windows 11
Interface Version: brms 2.18.0
Compiler/Toolkit:

Thanks in advance.

Howdy. It looks like what you did was take your data and run a hurdle Poisson model. Then, you subset your data for only counts > 0 and then ran a Poisson model and compared. The values for the coefficients from the Poisson model part from the hurdle Poisson are the same as from the Poisson on subset data, because the hurdle model only models the non-zero counts with the Poisson model. However, predictions (as seen in conditional_effects) are generated from the entire model, which in the case of the hurdle model comes from both the Bernoulli and the Poisson processes. The likelihood part of the hurdle model can be found using make_stancode() and is in the functions block at the start of the code. The relevant part is below:

real hurdle_poisson_log_lpmf(int y, real eta, real hu) {
    if (y == 0) {
      return bernoulli_lpmf(1 | hu);
    } else {
      return bernoulli_lpmf(0 | hu) +
             poisson_log_lpmf(y | eta) -
             log1m_exp(-exp(eta));
    }
  }

Thus, your estimated model coefficients from the hurdle Poisson will be the same as your coefficients from the Poisson on the data subset for counts>0, but the predictions of the outcome will not be the same, because there is also the Bernoulli process to predict zeroes in the hurdle model.
I used the below code to (I think) replicate what you did:

library(brms)
z <- c(rep(0,150),rep(1,150))
x <- rnorm(300, 0, 1)
lambda <- exp(2 + 0.2*z + 0.05*x + -0.2*x*z)
y_pois <- rpois(n=300, lambda=lambda)
theta <- 0.25
u <- runif(300, min = 0, max = 1)
y <- ifelse(u < theta, 0, y_pois[y_pois>0]) 

d <- cbind.data.frame(y,z,x)
d$z <- factor(d$z)
d$p_0 <- 0
d$p_0[d$y==0] <- 1
mean(d$p_0)

m1 <- brm(y ~ z*x, data=d, family=hurdle_poisson, cores=4)
m1

d2 <- subset(d, d$y>0)
m2 <- brm(y ~ z*x, data=d2, family=poisson, cores=4)
m2

conditional_effects(m1)
conditional_effects(m2)

Hurdle model on full data.

Poisson on subset data.

1 Like

Thank you very much for your valuable comments. You completely understand what I was trying to say. But my question still remains.

What I would most like to know is about your following comment:

However, predictions (as seen in conditional_effects ) are generated from the entire model, which in the case of the hurdle model comes from both the Bernoulli and the Poisson processes.

How does the presence of Bernoulli affect the y-axis scale? In other words, how can I compute the y-axis of the hurdle model?

For example, the value of Y is about 5.0 at X = 0 & Z = 0 in your figure of the hurdle model. How is Y = 5.0 calculated using X = 0 and Z = 0?

You need to weight it by the proportion of zeroes estimated in the hurdle model. That is why the predicted mean values are lower for the hurdle model - a certain proportion of them are zero and this lowers the mean compared to the Poisson fit to data with no zeroes.
You can show this to yourself by doing this (continuing based on my above sim and model code):

s1 <- as_draws_df(m1)
s2 <- as_draws_df(m2)
mean( (exp(s1$b_Intercept) * (1 - s1$hu)) + (0 * s1$hu) )  #hurdle model mean for x=0 and z=0 level
mean( (exp(s2$b_Intercept) )  #Poisson model mean for x=0 and z=0 level

#now let's check to make sure we did the above manual computation correctly
nd <- cbind.data.frame(x = 0, z = 0)
nd$z <- factor(nd$z)
fitted(m1, newdata=nd) #hurdle model mean for x=0 and z=0 level
fitted(m2, newdata=nd) #Poisson model mean for x=0 and z=0 level

Thank you for your prompt reply. I understand your explanation, but let me make a few clarifications on that.

In fact, I’ve read the explanation of the hurdle model on this site, and tested the following code before asking here:

lambda <- 4
y <- rpois(n=5000, lambda=lambda)
y_table <- table(y)
y2 <- y_table/sum(y_table) # P(y | lambda) = Poisson(y | lambda)
y3 <- tail(y2, -1) # remove the data for y = 0
plot(y3)

theta = 0.3 # the probability of zeros

# see the definition of a hurdle model
# https://mc-stan.org/docs/2_20/stan-users-guide/zero-inflated-section.html
y4 <- (1 - theta) * y3 / (1 - exp(-lambda)) 
sum(y4) # 1 - theta = 0.7

# The peaks of y2 and y4 are the same
plot(y2)
plot(y4) 

Since the peaks of y2 and y4 were equal, I’ve misunderstood that the y-axes of the Poisson and Hurdle models were equal. But the figure displayed by the conditional_effects was a prediction, and what I saw in the code above was not about the prediction but the probability of the Poisson part, was it?

I’m not sure what you mean by this, as when I run your code, they aren’t equal… the distributions look similar…
Yes, the conditional_effects() function shows plots of the predicted mean of the outcome variable at various levels of the predictors. When making predictions (as conditional_effects() does) from the hurdle Poisson, the Bernoulli process must be included, and thus a certain proportion of zeroes are added to the predicted response values, which naturally reduces (or increases, if the zeroes are deflated) the mean compared to predictions from the Poisson model on the data with no zeroes.

Thank you for your repeated replies. I made a writing error regarding the follwing sentence:

Since the peaks of y2 and y4 were equal

Correctly, I meant to write as follows:

For y2 and y4, the x (horizontal) values of the peaks are equal

You have really helped a lot. Thank you very much!

1 Like