Final update after some deep testing.
I followed your suggestion and looked carefully at how brms implements the GEV. Then I replicated the parametrization in a clean Stan model, and I finally obtained stable inference even for very heavy negative shape parameters (e.g., ξ = –1.5).
Here is what I found, which may be useful for other users struggling with the GEV:
1. The core issue was not the log-pdf itself, but the geometry
With ξ < 0 the GEV has a parameter-dependent upper endpoint
y < \mu - \frac{\sigma}{\xi},
which produces extremely sharp curvature in the posterior. A naive (mu, sigma, xi) parametrization (even with data standardization and non-centering) consistently led to 90–97% divergences for ξ ≈ –1.5.
2. brms avoids this problem with two key design choices
After studying the generated Stan code of brms, two things stood out:
-
No reject() calls in the log-pdf outside the support (my bad using it in my first attempts). Out-of-support evaluations simply return -inf, instead of triggering a discontinuity in the Hamiltonian.
-
A dedicated scaling function scale_xi()
This maps an unconstrained tmp_xi to a valid range for xi, computed from the data. This keeps the sampler away from pathological regions and dramatically improves geometry.
3. Once I implemented these two brms features, Stan sampling became completely stable
I built a minimal model that preserves the semantics of the GEV but uses:
- safe versions of the GEV log-pdf and log-cdf,
- the brms-style
scale_xi() (with extra numerical stability),
- no rejections in the lpdf.
Here are the posterior results for data generated with
μ = 10, σ = 3, ξ = –1.5, N = 2000:
mu = 9.83 (true: 10)
sigma = 3.15 (true: 3)
xi = -1.45 (true: -1.5)
Rhat = 1.00
ESS = very high (2000–5000)
Divergences = 0
BFMI = OK
This matches what brms produces.
4. Conclusion
The “brms parametrization” (no rejects + xi-scaling) is the crucial ingredient to stabilize NUTS for strongly negative ξ. The raw GEV parameterization is simply too geometrically difficult for HMC unless this reparameterization is used.
Happy to share the final Stan code if anyone is interested.