When I built the model, I encountered one or two divergences after running it. I made modifications, but one or two divergences still occasionally appear, especially when I calculate the power multiple times. I want to know if all divergences can be fixed, and whether we can tolerate this level of divergence.
The challenge is that 1) it’s not easy to know whether they are false positives or true positives, 2) how much posterior mass you are possibly missing, and 3) whether missing some posterior mass has significant effect on the inference for the quantity of interest.
In some cases you may tolerate them in early phase of the workflow ´, e.g., uf you end up discarding the model anyway because it clearly fails predictive checks.
In some cases you have insight to your posterior shape to know whether you can tolerate them even in the end of the workflow, but this requires expertise or careful analysis of the posterior.
If you are worried you can assess the possible bias with Posterior SBC
Thank you very much for your helpful explanation and for pointing me to this paper.
In my model, I have noticed that repeated runs sometimes produce no divergences, while other runs produce only one or two. I also need to explain that my key metric is binary classification, and I’ve built a hierarchical model. I’ve seen many discussions about similar divergent problems, which are easier to have divergent but harder to solve, right? What I see more often is that people initially had many divergent , but then after some adjustments, were they all gone? I’m unsure about descriptions and judgment criteria. My current interpretation is that whether my divergences appear seems to depend somewhat on the specific chain trajectories or seeds, rather than indicating a consistently severe problem.
I also checked the pairs plots and did not observe any obvious pathological structure, such as a clear funnel shape or concentrated clusters of divergent transitions. In addition, I performed a simple data-conditional sensitivity check on a fixed dataset and found that the key posterior summaries were essentially stable.
Based on these checks, my tentative view is that the occasional divergences may have limited impact on the posterior quantities I care about, although I understand that this does not prove the absence of bias. Would you consider this a reasonable interpretation? Furthermore, can I assume that, in most cases, only one or two instances of divergence likely do not affect the posterior distribution?
Thank you again for your guidance.
Divergences are not always a problem. They signal the Hamiltonian simulation in the sampler breaking down in the sense that the Hamiltonian should be conserved, but it isn’t.
Divergences typically add bias when they’re tied to particular regions (e.g., regions of high curvature). The only general advice we have is to run posterior predictive checks and see if it’s a problem for your application. Sometimes the bias is so small as to be irrelevant (i.e., it’ll be smaller than the error you get from variance in only taking a finite sample of draws).
The max tree depths of 15 and 12 indicate serious problems with your posterior geometry that are forcing step sizes down to very low values that then require many steps.
Sometimes you can get rid of all of them with a clean reparameterization. But for something like the funnel, the bad case is when there’s an intermediate amount of data. Small data likes non-centered parameterizations and large data likes centered. You can also shift the line here with relevant priors. If you have priors that are too weak, moving to something like weakly informative priors can sometimes help with removing divergences.
Thank you very much for your helpful comments and for clarifying the role of posterior predictive checks.
I realized that, because my current work is still at the trial design and sample size simulation stage, I do not yet have real observed trial data. Therefore, a formal posterior predictive check is not directly applicable at this stage. Instead, I performed prior predictive checks to evaluate whether the weakly informative priors used in my binary hierarchical model imply reasonable event rates and treatment effects.
The prior predictive results did not show obviously implausible behavior. The prior probability of extremely low or high control event rates was small, and the implied treatment effects, although intentionally weakly informative, appeared broadly reasonable for a design-stage analysis. I also performed sensitivity checks across different random seeds, adapt_delta values, and max_treedepth settings. The key posterior decision quantities were stable, and the Rhat and ESS values were satisfactory. In addition, the pairs plots did not reveal an obvious funnel-like geometry or clustering of divergent transitions.
Based on these checks, my current interpretation is that the occasional one or two divergences are unlikely to materially affect the main posterior decision quantities in this design simulation.
Thank you again for your guidance. Your explanation helped me understand that divergences should not be judged only by their number, but by whether they are associated with problematic posterior geometry and whether they affect the quantities relevant to the analysis.
You can apply posterior predictive checks to simulated data. But if you can simulate, you can also perform simulation-based calibration (SBC) checks. Those should actually work if your prior predictive is reasonable. You probably won’t be able to measure the effect of a few divergences using SBC, which should give you some confidence.
