Mathematical Notation for a zero inflated negative binomial model in brms

Hi Everyone,
Can someone please help me with the appropriate mathematical notation for the zero inflated negative binomial model below?

cover ~ functionalgroup + (functionalgroup | site:treatment) + (1 | plot)

I have four functional groups, three sites, five treatments, and 40 plots. Not all of the treatments occur at all of the sites which is why I have site:treatment.


Stan code is below

/ generated with brms 2.13.0
functions {

  /* zero-inflated negative binomial log-PDF of a single response 
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of negative binomial distribution
   *   phi: shape parameter of negative binomial distribution
   *   zi: zero-inflation probability
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_lpmf(int y, real mu, real phi, 
                                       real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         neg_binomial_2_lpmf(0 | mu, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             neg_binomial_2_lpmf(y | mu, phi); 
    } 
  } 
  /* zero-inflated negative binomial log-PDF of a single response 
   * logit parameterization of the zero-inflation part
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of negative binomial distribution
   *   phi: shape parameter of negative binomial distribution
   *   zi: linear predictor for zero-inflation part
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_logit_lpmf(int y, real mu, 
                                             real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                         bernoulli_logit_lpmf(0 | zi) + 
                         neg_binomial_2_lpmf(0 | mu, phi)); 
    } else { 
      return bernoulli_logit_lpmf(0 | zi) +  
             neg_binomial_2_lpmf(y | mu, phi); 
    } 
  }
  /* zero-inflated negative binomial log-PDF of a single response 
   * log parameterization for the negative binomial part
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for negative binomial distribution 
   *   phi: shape parameter of negative binomial distribution
   *   zi: zero-inflation probability
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_log_lpmf(int y, real eta, 
                                           real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         neg_binomial_2_log_lpmf(0 | eta, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             neg_binomial_2_log_lpmf(y | eta, phi); 
    } 
  } 
  /* zero-inflated negative binomial log-PDF of a single response
   * log parameterization for the negative binomial part
   * logit parameterization of the zero-inflation part
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for negative binomial distribution 
   *   phi: shape parameter of negative binomial distribution
   *   zi: linear predictor for zero-inflation part 
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_log_logit_lpmf(int y, real eta, 
                                                 real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                         bernoulli_logit_lpmf(0 | zi) + 
                         neg_binomial_2_log_lpmf(0 | eta, phi)); 
    } else { 
      return bernoulli_logit_lpmf(0 | zi) +  
             neg_binomial_2_log_lpmf(y | eta, phi); 
    } 
  }
  // zero_inflated negative binomial log-CCDF and log-CDF functions
  real zero_inflated_neg_binomial_lccdf(int y, real mu, real phi, real hu) { 
    return bernoulli_lpmf(0 | hu) + neg_binomial_2_lccdf(y | mu, phi);
  }
  real zero_inflated_neg_binomial_lcdf(int y, real mu, real phi, real hu) { 
    return log1m_exp(zero_inflated_neg_binomial_lccdf(y | mu, phi, hu));
  }
}
data {
  int<lower=1> N;  // number of observations
  int Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> shape;  // shape parameter
  real<lower=0,upper=1> zi;  // zero-inflation probability
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1;
  vector[N_2] r_2_2;
  vector[N_2] r_2_3;
  vector[N_2] r_2_4;
  r_1_1 = (sd_1[1] * (z_1[1]));
  // compute actual group-level effects
  r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
  r_2_1 = r_2[, 1];
  r_2_2 = r_2[, 2];
  r_2_3 = r_2[, 3];
  r_2_4 = r_2[, 4];
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_2_2[J_2[n]] * Z_2_2[n] + r_2_3[J_2[n]] * Z_2_3[n] + r_2_4[J_2[n]] * Z_2_4[n];
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 2.1, 2.5);
  target += gamma_lpdf(shape | 0.01, 0.01);
  target += beta_lpdf(zi | 1, 1);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
  target += student_t_lpdf(sd_2 | 3, 0, 2.5)
    - 4 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(to_vector(z_2));
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  // likelihood including all constants
  if (!prior_only) {
    for (n in 1:N) {
      target += zero_inflated_neg_binomial_log_lpmf(Y[n] | mu[n], shape, zi);
    }
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}

Below is what I came up with, I would appreciate any suggestions! Thanks! Fg is functional group.

Cover _{i} \sim Negative Binomial \left(\mu_{i}, \gamma, z\right)
e^{\mu_{i} }\sim \alpha_{[i]}+\alpha_{p \operatorname{lot}[i]}+\alpha_{\text {site:treatment }[i]}+\left(\beta[i]+\beta_{\text {site:treatment }[i]}\right) * Fg_{i}

Here are the default priors:

\beta \sim uniform (-\infty, \infty)
\alpha \sim student _{t}(3,2.1,2.5)
\sigma_{\alpha_{p l o t}} \sim student _{t}(3,0,2.5)
\sigma_{\beta_{\text {site:treatment }}} \sim student _{t}(3,0,2.5)
\sigma_{\alpha_{\text {site:treatment }}} \sim student _{t}(3,0,2.5)
\gamma \sim g a m m a(0.01,0.01)
z \sim \operatorname{beta}(1,1)
\left[\begin{array}{l}\alpha_{\text {site:treatment }} \\ \beta_{\text {site:treatment }}\end{array}\right] \sim MVNormal \left(\left[\begin{array}{l}0 \\ 0\end{array}\right], S\right)
S=\left(\begin{array}{cc}\sigma_{\alpha_{\text {site:treatment }}}^{2} & \sigma_{\alpha_{\text {site:treatment }} \sigma_{\beta_{\text {site:treatment }}} \rho} \\ \sigma_{\alpha_{\text {site:treatment }}} \sigma_{\beta_{\text {site:treatment }}} \rho & \sigma_{\alpha_{\text {site:treatment }}}^{2}\end{array}\right)
R \sim L K J \operatorname{corr}(1)

Hi @michala6, could you please add the brms tag to this post so others can more easily find it?

1 Like

For:

cover ~ functionalgroup + (functionalgroup | site:treatment) + (1 | plot)

I didn’t find \alpha[i] term. I think there should be an intercept term that isn’t a function of i though.

I didn’t check the priors closely though – did you have a particular goal in mind for this? In terms of figuring out exactly what brms is doing, I think you’re doing the right thing to print out the stan code and look at it (and you can also get the data with make_standata).

1 Like

Hey there!

Let me give it a try:

p(y_i | \mu_i, \gamma) = \begin{cases} \xi + (1- \xi)\cdot\text{NegativeBinomial}(0 | \mu_i, \gamma) & \text{if } y_i = 0\\ (1- \xi)\cdot\text{NegativeBinomial}(y_i | \mu_i, \gamma) & \text{if } y_i \neq 0\\ \end{cases}

with priors

\begin{align} \xi &\sim \text{Beta}(1,1) \\ \gamma &\sim \text{Gamma}(0.01, 0.01) \end{align}

and linear predictor

\log \mu_i = \alpha_i + X_i\beta_{j[i]} \\

with J site:treatment groups, P plot groups and X being the functionalgroup indicator (matrix) appended with an intercept vector (all 1’s) in first position. The intercept \alpha_i is

\alpha_i = \alpha_0 + \alpha_{p[i]}

with priors

\begin{align} \alpha_0 &\sim \text{Student-t}(3, 2.1, 2.5) \\ \alpha_p &\sim \text{Normal}(0, \sigma_\text{plot}) \\ \sigma_\text{plot} &\sim \text{Student-t}^+(3, 0, 2.5) \\ \end{align}

where \text{Student-t}^+ is the (positive) Half-Student-t distribution.

Writing down the priors for \beta can be a bit awkward and there is definitely more than one correct way to do it. The way that I have set it up above meant hat there are K = 1 + (J-1) columns in X: One intercept and J-1 dummy (one-hot/binary) indicator variables for the J functional groups. Therefore, \beta is an array of size J of length K vectors… (Although in this setup J=K… That is a bit confusing :/)

\begin{align} \beta_j &\sim \text{MVN}_K(\mu_\beta, \Sigma_\beta) \\ \Sigma_\beta &= \text{diag}(\sigma_\text{site:treatment}) \Omega \text{diag}(\sigma_\text{site:treatment}) \\ \end{align}

where \text{MVN}_K is a K-dimensional Multivariate Normal distribution with \mu_{\beta,k} = 0 for k = 1, because it is absorbed in the overall intercept parameter \alpha_0 above, and priors

\begin{align} \mu_{\beta,k} &\sim \text{Uniform}(-\infty,\infty) \text{ for } k \gt 1 \\ \Omega &\sim \text{LKJ}(1) \\ \sigma_\text{site:treatment} &\sim \text{Student-t}^+(3,0,2.5).\\ \end{align}

I think that covers everything. That being said there are other (perhaps more clever) ways to write this model down and what I have written is not exactly what brms is doing computationally, but should be mathematically equivalent (modulo error and typos…).

Also, I’d also suggest you try specifying more tighter priors for some of the parameters. Particularly the population-level coefficients for functionalgroup (that is \mu_{\beta,k} in the model above) could use an at least weakly informed prior, such as Normal(0, 2.5) or even tighter – it’s no the log-scale so this weakly informative prior already covers a huge range of plausible values.

Cheers,
Max

7 Likes

Thanks for this response!
I mostly wrote out the priors because I am relatively new to brms and zero inflated models in general. I thought it would be helpful for me to understand what is going on. Thanks for the make_standata tip!

Thanks Max, this is really helpful! This is my first time working with a zero inflated data distributions so I was confused about the proper notation for that. Setting priors has been on my radar, but since I am relatively new to brms and stan it hasn’t been approachable to me yet. Thanks for the prior suggestion - I will try it out!

2 Likes

If it’s okay with you I’ll mark my answer above as solution. If you encounter other difficulties with your model I suggest you open another thread (you can link to this one of course). That helps us keep track of what answers has been answered etc. (If you object just reply or write a message to me.)

Cheers,
Max

1 Like

@Max_Mantei I understand the formulation you provided is what is implemented by brms. I don’t know much about zero-inflated negative binomial distribution, but in books I usually see that the zero-inflation parameter \xi is modeled through a logistic process that mirrors the negative binomial part in parallel. In other words, using the current data as an example, one might have a similar effect partition as below,

Pr(\xi=1)=logit^{-1}(\alpha_0^{(0)}+X_i\beta_{j[i]}^{(0)}).

In this formulation, the zero-inflation component would vary in the same way as the negative binomial component, which makes sense to me. In contrast, the brms implementation seems to assume that all the effects (e.g., groups, plots, etc.) share exactly the same zero-inflation probability?

Is the model implemented in brms a little oversimplified? Or do I fundamentally misunderstand the distribution?

Hey there!

You are right in that usually the zero-inflation parameter is modeled with a logistic regression, often with the same set of covariates as the NegativeBinomial/Poisson part. And to sure: You can do that with brms, too! @michala6 did not provide the brm function call, but one can infer that they did not provide any covariates for the zero-inflation part of the model by looking at the (provided) generated model code, where real<lower=0,upper=1> zi; // zero-inflation probability. So that’s what I wrote in the math formulation above.

You understand it correctly and brms is able to do this. It’s just that in this particular case, the user did not chose to do so – whether that choice makes sense or not I don’t know.

I think that you actually also model the overdispersion parameter of the NegativeBinomial part of the model with log-linked predictors (which will probably be hard to fit). I think the brms implementation is one of the most flexible you can find in any package.

Check out this vignette. :)

Cheers,
Max

1 Like

Thanks for explaining the great flexibility of brms implementations, @Max_Mantei !

1 Like

@Max_Mantei May I borrow your thinking cap?

Suppose that I fit a simple zero-inflated negative binomial model through brms:

brm(bf(count ~ 1, zi~1), ...)

with the following output:

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       -9.19      0.08    -9.35    -9.03 1.02      250      458
zi_Intercept   -20.90      0.21   -21.35   -20.52 1.00     2331     1581

Is the mean population-level effect just

e^{Intercept}=e^{-9.19}?

Or do I need to adjust the above estimate with zi_Intercept somehow like

\frac{1}{1+e^{zi\_Intercept}} e^{Intercept}=\frac{1}{1+e^{-20.90}} e^{-9.19}?

Hey there,

yes, you need to adjust, just like you did.

Think of it generatively: The probability that we are in the zero regime of the zero inflation part of the model is p=1/(1+\exp(-\text{zi_Intercept})) and then the value is 0. With probability 1-p, which is simply p=1/(1+\exp(\text{zi_Intercept})) I think, we are in the “non-inflated” part of the model. The non-inflated part of the model has expected value of \mu=\exp(\text{Intercept}) and the “population” expected value would be (1-p)\mu, which is E(y)=1/(1+\exp(-20.90))\exp(-9.19), like you wrote.

This is still just an estimate, though, and you’d need to take into account the uncertainty about the parameters Intercept and zi_Intercept.

Cheers,
Max

1 Like

Thanks a lot for the confirmation, Max!

This is still just an estimate, though

I know. I was just trying to confirm my reasoning at the conceptual level.

1 Like