When I built the model, I encountered one or two divergences after running it. I made modifications, but one or two divergences still occasionally appear, especially when I calculate the power multiple times. I want to know if all divergences can be fixed, and whether we can tolerate this level of divergence.
The challenge is that 1) it’s not easy to know whether they are false positives or true positives, 2) how much posterior mass you are possibly missing, and 3) whether missing some posterior mass has significant effect on the inference for the quantity of interest.
In some cases you may tolerate them in early phase of the workflow ´, e.g., uf you end up discarding the model anyway because it clearly fails predictive checks.
In some cases you have insight to your posterior shape to know whether you can tolerate them even in the end of the workflow, but this requires expertise or careful analysis of the posterior.
If you are worried you can assess the possible bias with Posterior SBC
Thank you very much for your helpful explanation and for pointing me to this paper.
In my model, I have noticed that repeated runs sometimes produce no divergences, while other runs produce only one or two. I also need to explain that my key metric is binary classification, and I’ve built a hierarchical model. I’ve seen many discussions about similar divergent problems, which are easier to have divergent but harder to solve, right? What I see more often is that people initially had many divergent , but then after some adjustments, were they all gone? I’m unsure about descriptions and judgment criteria. My current interpretation is that whether my divergences appear seems to depend somewhat on the specific chain trajectories or seeds, rather than indicating a consistently severe problem.
I also checked the pairs plots and did not observe any obvious pathological structure, such as a clear funnel shape or concentrated clusters of divergent transitions. In addition, I performed a simple data-conditional sensitivity check on a fixed dataset and found that the key posterior summaries were essentially stable.
Based on these checks, my tentative view is that the occasional divergences may have limited impact on the posterior quantities I care about, although I understand that this does not prove the absence of bias. Would you consider this a reasonable interpretation? Furthermore, can I assume that, in most cases, only one or two instances of divergence likely do not affect the posterior distribution?
Thank you again for your guidance.
Divergences are not always a problem. They signal the Hamiltonian simulation in the sampler breaking down in the sense that the Hamiltonian should be conserved, but it isn’t.
Divergences typically add bias when they’re tied to particular regions (e.g., regions of high curvature). The only general advice we have is to run posterior predictive checks and see if it’s a problem for your application. Sometimes the bias is so small as to be irrelevant (i.e., it’ll be smaller than the error you get from variance in only taking a finite sample of draws).
The max tree depths of 15 and 12 indicate serious problems with your posterior geometry that are forcing step sizes down to very low values that then require many steps.
Sometimes you can get rid of all of them with a clean reparameterization. But for something like the funnel, the bad case is when there’s an intermediate amount of data. Small data likes non-centered parameterizations and large data likes centered. You can also shift the line here with relevant priors. If you have priors that are too weak, moving to something like weakly informative priors can sometimes help with removing divergences.
