How Hamiltonian Monte Carlo handle the discontinuity at zero in hurdle models?

MarziehShahmandi · June 14, 2021, 7:26pm

Hi,

I have a semicontinuous dependent variable (y^*_i) with a mass point at zero, I fitted a two-part model with hurdle at zero with probability function as follows:

f(y^*_i|\boldsymbol{\gamma},\boldsymbol{\beta_\tau}, \sigma,v_i, \tau)=(\omega_i)~.~\mathbb{I}(y^*_i=0)+(1-\omega_i).~\mathcal{N}(\mathbf{x_i}^{T}\boldsymbol{\beta_\tau}+ \theta v_i,\psi^2\sigma v_i)~\mathbb{I}(y^*_i \neq 0)

Where \omega_i=P(y^*_i = 0), \mathbb{I} is an indicator function, and \mathbf{x_i} is a set of three independent variables. A logit link is used to model \omega_i based on a set of three independent variables (\mathbf{z_i}) as below:

\mbox{logit}(\omega_i)=\mathbf{z_i}^{T}\boldsymbol{\gamma}

In this hurdle model, a continuous normal distribution (corresponding to quantile regression) is used for modelling the non-zeros and logistic regression is applied for modelling the probability of zeros versus non-zeros.

To estimate the continuous vectors of parameters \boldsymbol{\beta} and \boldsymbol{\gamma} in the model, I used the Gibbs sampling technique, and the autocorrelation functions of \beta_3 and \gamma_3 showed slow mixing and slow convergence (even with considering a large number for iterations, burn-in and thinning number). Then I moved to stan to use Hamiltonian Monte Carlo (HMC) that was helpful for my case. The autocorrelation plots showed a great and fast decreasing pattern, approaching zero, showing good mixing. The effect sample sizes were larger while I just used 4 chains with 2000 iterations and warm-up 1000.

Now, I just have one question. As you know HMC is based on the derivative of the logarithm of the posterior distribution and so continuity is essential for the posterior function. In my hurdle model, there is a discontinuity at zero. How HMC can handle this problem (or in general, how HMC can handle discontinuity of the likelihood function in hurdle models for computation of the derivative). Can anyone offer a good intuitive answer to this question? Or perhaps point me to a reference where this is discussed? I also added my stan code, please let me know if you have any comments.

#Stan code
cat("

//Definition of the inputs imported for feeding the model
data {
  int<lower=0> N;
  int<lower=0> k;
  matrix[N, k] X;
  vector[N] y;
  real<lower=0, upper=1> tau;
}

//Definition of the parameters in the model
parameters {
 
  vector[k] beta;
  vector[k] gammaa;
  real<lower=0> delta;
  vector<lower=0>[N] w;
}

//Computation of transformed parameters
transformed parameters {
  vector[N] perc;
  vector[N] perc2;
  vector<lower=0, upper=1>[N] pi;
  vector[N] mu;
  vector[N] mu2;
 
 
  perc  <- (tau * (1 - tau) * delta) ./ (2 * w);
  for(n in 1:N) perc2[n] <- inv_sqrt(perc[n]);
  for (n in 1:N) pi[n]<- inv_logit( X[n]*gammaa);
  mu  <- X*beta;
  mu2  <- (1 - 2 * tau) / (tau * (1 - tau)) * w + mu;
}

model {

  //Priors used for parameters
  for(i in 1:k)
  {
   beta[i]  ~ normal(0, 3);
   gammaa[i]  ~ normal(0, 3);
  }
  delta ~ gamma(0.001,0.001);
  w ~ exponential(delta);

  //Definition of the mass probability function of the model
  
   for (i in 1:N) {
   
   (y[i] == 0) ~ bernoulli(1-pi[i]);
    if (y[i] != 0) y[i] ~ normal(mu2[i], perc2[i]);
    }
        
      
}
" , file="stan_tpqr.stan")

jsocolar · June 14, 2021, 7:52pm

Your code marginalizes over the indicator variable with its if statement. (NB: In hurdle models like this one, where the value of the indicator is completely fixed by the data, the marginalization is quite simple to do; in zero-inflated models the marginalization can be a bit trickier).

Your likelihood doesn’t contain a discontinuity at zero because you use the data to predetermine which likelihood to use for which points. For every data point in your model, you are using just one continuous likelihood.

MarziehShahmandi · June 14, 2021, 8:11pm

Thank you very much for your help.
I am wondering about the case of the Poisson hurdle model that for the non-zero state of the model, the count Poisson distribution is used, how does HMC handle the derivative process?

jsocolar · June 14, 2021, 8:15pm

You need to specify a Stan model that marginalizes over the latent binary indicator. Here’s a post with example code for the zero-inflated Poisson case

Here’s @mbjoseph 's super helpful tutorial on marginalization:

laifuthegreat · June 14, 2021, 8:42pm

That’s indeed a discontinuity in the data space. f(theta, x) jumps when x goes to 0. But it’s not discontinuous in parameter space. Monte carlo differentiates and explores the posterior distribution of the parameters rather than the data. There are no derivatives of the data because the data are fixed.

MarziehShahmandi · June 14, 2021, 8:45pm

Thank you very much for your guidance.

MarziehShahmandi · June 14, 2021, 8:46pm

Thank you very much. I got it.

Topic		Replies	Views
Preventing numerical 0s in hurdle model Modeling	3	514	August 21, 2020
Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods Developers	4	1751	January 2, 2022
Modelling "hurdle lognormal" vs "hurdle gamma" on data with many zeros Modeling specification , brms	4	62	March 21, 2025
Comparing Hurdle Model with Non-Hurdle Model Modeling loo	5	470	November 21, 2023
Coding hurdle model with censored data Modeling rstan , specification	2	631	November 24, 2022

How Hamiltonian Monte Carlo handle the discontinuity at zero in hurdle models?

Related topics