Does reject imply divergent?

I read the function reference and user guide sections for the reject statement, but the exception handling / control flow during sampling in the event of a user rejection isn’t clear for me. I had always assumed that the rejection is of the proposal sample and the corresponding leapfrog trajectory, and that a new random momentum is generated and then another leapfrog trajectory etc.

However I have a model now consistently hitting my reject statement every sample (I set refresh=1 and see Iteration N/2000 statements interposed with rejection messages), but the chain makes progress and each sample is marked divergent.

Is there a connection? Does reject allow for samples to be accepted but divergent?

Thanks in advance for any clarification or pointers to docs I may’ve missed.

2 Likes

reject() always rejects the current proposal and marks the trajectory as divergent. However, a divergence does not mean rejecting the entire trajectory. The sample is chosen from the first half of the trajectory. So the sampler can make progress even if every trajectory (but not every proposal) ends in a divergence, it’s just slower.

3 Likes

divergences during warmup are OK - as you say, the sampler is making progress. however, if this happens during sampling, then the reported draw introduces bias in the estimate, so your sample is not a true sample from the posterior.

1 Like

If a divergence it detected at finite target then yes the detailed balance condition is slightly off and the asymptotic distribution will not converge exactly. If the target is infinite (as it effectively is when reject()ing) then I think detailed balance still holds. The main problem with divergencies is that the convergence is so slow you’ll never reach the asymptotics anyway.

1 Like

Allow me to clarify a few points that I think have been confused in this thread.

In Stan’s implementation of dynamics Hamiltonian Monte Carlo a numerical trajectory is expanded in stages, randomly choosing a time direction and expanding the previous trajectory in that direction to double the number of states until a termination criterion is triggered. How that trajectory is used depends on what termination criterion is triggered.

(1) If a divergence (rapid change in density) is encountered in any of the new states then the states in the trajectory expansion are ignored and a sample is drawn from the previous trajectory.

(2) If the generalized No-U-Turn criterion is violated anywhere amongst the new states then the states in the trajectory expansion are ignored and a sample is drawn from the previous trajectory.

(3) If the generalized No-U-Turn criterion is violated for the entire expanded trajectory then a sample is drawn from the new expanded trajectory.

This procedure ensures the correct invariant distribution so long as the target density function is consistent, which is guaranteed by the confines of the Stan Modeling Language (up to errors in floating point arithmetic and the implementation of transcendental functions, etc). This then always ensures at least asymptotic consistency.

In the Stan Modeling Language reject is implemented by throwing a domain error in the density function/gradient calculation which Stan’s Hamiltonian Monte Carlo implementation reinterprets as an infinite density. In other words if reject is called then the Stan program will return positive infinity.

The jump from any finite target density to an infinite target density then triggers the divergence condition, which then triggers condition (1) above. All of the states in the current expansion, including the one that triggered the rejection, are thrown away and the sampler chooses a new state from previous trajectory.

If the reject statement is within a conditional statement then it may not be called until the trajectory has already grown a bit, in which case the sampler will not necessarily return the initial point over and over again. If the reject statement is always called, however, then the first expansion away from the initial point should be immediately rejected and sampler will return the same initial point over and over again.

Importantly Stan’s Hamiltonian Monte Carlo algorithm is not a Metropolis-Hastings algorithm and so any intuition and expectations based on accepting/rejecting a single proposed state won’t make much sense.

3 Likes

Thanks for these clarifications, particularly the distinction between HMC and MH helps explain me understand why a chain can make progress while presenting (user, conditional) rejections at every iteration.