How Hamiltonian Monte Carlo handle the discontinuity at zero in hurdle models?

Hi,

I have a semicontinuous dependent variable (y^*_i) with a mass point at zero, I fitted a two-part model with hurdle at zero with probability function as follows:

f(y^*_i|\boldsymbol{\gamma},\boldsymbol{\beta_\tau}, \sigma,v_i, \tau)=(\omega_i)~.~\mathbb{I}(y^*_i=0)+(1-\omega_i).~\mathcal{N}(\mathbf{x_i}^{T}\boldsymbol{\beta_\tau}+ \theta v_i,\psi^2\sigma v_i)~\mathbb{I}(y^*_i \neq 0)

Where \omega_i=P(y^*_i = 0), \mathbb{I} is an indicator function, and \mathbf{x_i} is a set of three independent variables. A logit link is used to model \omega_i based on a set of three independent variables (\mathbf{z_i}) as below:

\mbox{logit}(\omega_i)=\mathbf{z_i}^{T}\boldsymbol{\gamma}

In this hurdle model, a continuous normal distribution (corresponding to quantile regression) is used for modelling the non-zeros and logistic regression is applied for modelling the probability of zeros versus non-zeros.

To estimate the continuous vectors of parameters \boldsymbol{\beta} and \boldsymbol{\gamma} in the model, I used the Gibbs sampling technique, and the autocorrelation functions of \beta_3 and \gamma_3 showed slow mixing and slow convergence (even with considering a large number for iterations, burn-in and thinning number). Then I moved to stan to use Hamiltonian Monte Carlo (HMC) that was helpful for my case. The autocorrelation plots showed a great and fast decreasing pattern, approaching zero, showing good mixing. The effect sample sizes were larger while I just used 4 chains with 2000 iterations and warm-up 1000.

Now, I just have one question. As you know HMC is based on the derivative of the logarithm of the posterior distribution and so continuity is essential for the posterior function. In my hurdle model, there is a discontinuity at zero. How HMC can handle this problem (or in general, how HMC can handle discontinuity of the likelihood function in hurdle models for computation of the derivative). Can anyone offer a good intuitive answer to this question? Or perhaps point me to a reference where this is discussed? I also added my stan code, please let me know if you have any comments.

#Stan code
cat("

//Definition of the inputs imported for feeding the model
data {
  int<lower=0> N;
  int<lower=0> k;
  matrix[N, k] X;
  vector[N] y;
  real<lower=0, upper=1> tau;
}

//Definition of the parameters in the model
parameters {
 
  vector[k] beta;
  vector[k] gammaa;
  real<lower=0> delta;
  vector<lower=0>[N] w;
}

//Computation of transformed parameters
transformed parameters {
  vector[N] perc;
  vector[N] perc2;
  vector<lower=0, upper=1>[N] pi;
  vector[N] mu;
  vector[N] mu2;
 
 
  perc  <- (tau * (1 - tau) * delta) ./ (2 * w);
  for(n in 1:N) perc2[n] <- inv_sqrt(perc[n]);
  for (n in 1:N) pi[n]<- inv_logit( X[n]*gammaa);
  mu  <- X*beta;
  mu2  <- (1 - 2 * tau) / (tau * (1 - tau)) * w + mu;
}

model {

  //Priors used for parameters
  for(i in 1:k)
  {
   beta[i]  ~ normal(0, 3);
   gammaa[i]  ~ normal(0, 3);
  }
  delta ~ gamma(0.001,0.001);
  w ~ exponential(delta);

  //Definition of the mass probability function of the model
  
   for (i in 1:N) {
   
   (y[i] == 0) ~ bernoulli(1-pi[i]);
    if (y[i] != 0) y[i] ~ normal(mu2[i], perc2[i]);
    }
        
      
}
" , file="stan_tpqr.stan")
1 Like

Your code marginalizes over the indicator variable with its if statement. (NB: In hurdle models like this one, where the value of the indicator is completely fixed by the data, the marginalization is quite simple to do; in zero-inflated models the marginalization can be a bit trickier).

Your likelihood doesn’t contain a discontinuity at zero because you use the data to predetermine which likelihood to use for which points. For every data point in your model, you are using just one continuous likelihood.

Thank you very much for your help.
I am wondering about the case of the Poisson hurdle model that for the non-zero state of the model, the count Poisson distribution is used, how does HMC handle the derivative process?

You need to specify a Stan model that marginalizes over the latent binary indicator. Here’s a post with example code for the zero-inflated Poisson case

Here’s @mbjoseph 's super helpful tutorial on marginalization:

That’s indeed a discontinuity in the data space. f(theta, x) jumps when x goes to 0. But it’s not discontinuous in parameter space. Monte carlo differentiates and explores the posterior distribution of the parameters rather than the data. There are no derivatives of the data because the data are fixed.

Thank you very much for your guidance.

Thank you very much. I got it.