Divergence check does not satisfy time-reversibility

howsiyu · January 6, 2024, 1:24pm

Hi,

I’m aware that Stan does not use slice sampler in the NUTS paper but sample directly from the probabilities. However, u in the paper is also used to check divergence while current Stan code just use the initial energy, as seen by

github.com

stan-dev/stan/blob/db36a6585a06fad5b4c4386579a5bbce7735e35f/src/stan/mcmc/hmc/nuts/base_nuts.hpp#L262


      
          // Base case
          if (depth == 0) {
            this->integrator_.evolve(this->z_, this->hamiltonian_,
                                     sign * this->epsilon_, logger);
            ++n_leapfrog;
          
            double h = this->hamiltonian_.H(this->z_);
            if (std::isnan(h))
              h = std::numeric_limits<double>::infinity();
          
            if ((h - H0) > this->max_deltaH_)
              this->divergent_ = true;
          
            log_sum_weight = math::log_sum_exp(log_sum_weight, H0 - h);
          
            if (H0 - h > 0)
              sum_metro_prob += 1;
            else
              sum_metro_prob += std::exp(H0 - h);
          
            z_propose = this->z_;

I believe this destroys time-reversibility as different states in the trajectory have different energies. Not sure how much this matters though as divergence should be very rare.

A simple correction is to require the difference between the min and max energies in a tree has to be smaller than divergence threshold.

nhuurre · January 7, 2024, 11:46am

You’re correct. Here’s a concrete example:

parameters {
  vector[3] x;
}
model {
  x ~ std_normal();
  if (x[1] < 0)
    target += -1000;
}
generated quantities {
  real q = -1.0 + 2.0*std_normal_cdf(x[1]); // q ~ uniform(0,1)
}

Theoretically the model implies that P(x[1] < 0) is negligible and x[1] should have a half-normal distribution.
But a difference of -1000 is exactly at the threshold of divergence detection; whether the trajectory is considered divergent or not depends on the initial point.
One million draws from Stan with with adapt delta=0.5 gives a visibly biased histogram:

Difference of -995 does not see any divergencies; and a difference of -1005 sees a divergence every time the trajectory crosses x[1]=0 (independent of the initial point); in either case Stan recovers the correct distribution.

One should also suspect bias from the fact that about 50% of all transitions are divergent. Even if we fix the divergence check to maintain reversibility, samples with divergencies are unreliable because divergent transitions are a sign of HMC being unable to move across the parameter space.

Topic		Replies	Views
Sampler Diagnostics Developers features	39	2882	April 5, 2020
Does reject imply divergent? General	5	576	May 28, 2020
Origin of cutoff point 1000 for a divergence Algorithms	6	162	March 9, 2025
Diagnosting posterior geometry with divergences General	7	1605	January 10, 2022
NUTS misses U-turns, runs in circles until max_treedepth Algorithms	66	5538	August 31, 2019

Divergence check does not satisfy time-reversibility

Related topics