Custom Likelihood for Zero-inflated Lognormal Model

I’m trying to fit a zero-inflated lognormal model using Stan. SAS PROC NLMIXED allows me to write a custom likelihood function to fit this model, but now I’d like to fit a Bayesian model, hence Stan.

Here’s my code:

zero_inflated_lognormal <- custom_family(
  "zero_inflated_lognormal", dpars = c("mu", "sigma", "zi"),
  links = c("identity", "identity", "logit"), lb = c(NA, 0, NA),
  type = "real"
)
stan_funs <- "
// Arguments: 
  // y: the response value
  // mu: mean parameter of the lognormal distribution
  // sigma: sd parameter of the lognormal distribution
  // zi: zero-inflation probability
// Returns:
  // a scalar to be added to the log posterior

// zero-inflated lognormal log-PDF of a single response

  real zero_inflated_lognormal_lpdf(real y, real mu, real sigma, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         lognormal_lpdf(0 | mu, sigma)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             lognormal_lpdf(y | mu, sigma); 
    } 
  }
  
// zero-inflated lognormal log-PDF of a single response
// logit parameterization of the zero-inflation part

  real zero_inflated_lognormal_logit_lpdf(real y, real mu, real sigma, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                         bernoulli_logit_lpmf(0 | zi) + 
                         lognormal_lpdf(0 | mu, sigma)); 
    } else { 
      return bernoulli_logit_lpmf(0 | zi) +  
             lognormal_lpdf(y | mu, sigma); 
    } 
  }
  
// zero-inflated lognormal log-CCDF and log-CDF functions
  
  real zero_inflated_lognormal_lccdf(real y, real mu, real sigma, real zi) { 
    return bernoulli_lpmf(0 | zi) + lognormal_lccdf(y | mu, sigma); 
  }
  real zero_inflated_lognormal_lcdf(real y, real mu, real sigma, real zi) { 
    return log1m_exp(zero_inflated_lognormal_lccdf(y | mu, sigma, zi));
  }
}
"
stanvars <- stanvar(scode = stan_funs, block = "functions")
fit <- brm(tvhours ~ 1, data = rDat, 
           family = zero_inflated_lognormal, stanvars = stanvars)

I believe I wrote the custom likelihood correctly, but I’m receiving the following error message after running the last chunk of code (directly above):
“PARSER FAILED TO PARSE INPUT COMPLETELY STOPPED AT LINE 49:”

I read the following in the Stan User’s Guide:
“Zero inflation and hurdle models can be formulated for discrete distributions other than the Poisson. Zero inflation does not work for continuous distributions in Stan because of issues with derivatives; in particular, there is no way to add a point mass to a continuous distribution, such as zero-inflating a normal as a regression coefficient prior.”

My question is, is it really impossible to fit a zero-inflated lognormal model using Stan??

Madeline

Oh sorry, I didn’t see this thread before I answered in the other. Zero-Inflated LogNormal should be possible. But it’s too late now… I’ll answer tomorrow, ok?

Hey! Okay, so I believe you are confusing “zero-inflation” with “hurdle”. My way to think about this is that for (zero-)inflation you need something to be there in the first place. In a hurdle model, you first have to jump the hurdle before you get to the other part.

In case of the Poisson for example, you already got probability mass at 0. So putting extra probability mass on 0 is inflation.

In case of the LogNormal the probability at 0 is 0. You’d first have to jump the hurdle (go beyond zero) before you get to the part where the LogNormal does have non-zero probability. This is also what I think is wrong in your code: lognormal_lpdf(0 | mu, sigma) will return -Inf. The correct approach is actually simpler (LogNormal hurdle):

    if (y == 0) { 
      return bernoulli_lpmf(1 | zi);
    } else { 
      return bernoulli_lpmf(0 | zi) + lognormal_lpdf(y | mu, sigma); 
    } 

(You could also have Poisson hurdle model, but that would mean truncating the Poisson (to y > 0), so that the Bernoulli part is solely responsible for the probability contribution at y = 0. This is a bit more tricky to code.)

I don’t think I did a particularly good job verbalizing my thoughts here… Hope this helps anyways. But feel free to ask If something not clear.

Cheers,
Max

Ahhh! This makes sense! I guess I never realized all this time I’ve been fitting these models in SAS I’ve been fitting Hurdle, not zero-inflated models. Thank you so much for taking the time to respond. You have no idea how much time you just saved me.

I do have one follow-up question. Why is there a Hurdle Poisson option in brms if Poisson distributions have probability mass at 0? Shouldn’t there just be a Zero-Inflated Poisson option? Maybe the Hurdle Poisson option uses a truncated Poisson distribution?

Madeline

They correspond to different data generating mechanisms and thus are both useful.

1 Like

Nothing to add to @avehtari 's answer. Only that you can also use the make_stancode function to generate the Stan code of your brms model and see how stuff is implemented.

1 Like