Generalized Extreme Value distribution

fabio · December 2, 2025, 9:17am

Dear community,

since I’d rather not reinvent the wheel (especially when the result would likely be a sub-optimal one), I’m wondering whether someone has already implemented functions for the Generalized Extreme Value (GEV) distribution in Stan.

So far I’ve been using the built-in Gumbel distribution, but I’d like the additional flexibility that comes with modelling the full GEV family — even if that means estimating one more parameter.

If anyone has code snippets or examples they’re willing to share, I’d be very grateful.

Thanks in advance!

WardBrian · December 2, 2025, 5:46pm

This distribution is one of the brms supported families, so you could look at the code they use: brms/inst/chunks/fun_gen_extreme_value.stan at master · paul-buerkner/brms · GitHub

They appear to be pretty straightforward definitions

fabio · December 3, 2025, 2:50pm

Final update after some deep testing.

I followed your suggestion and looked carefully at how brms implements the GEV. Then I replicated the parametrization in a clean Stan model, and I finally obtained stable inference even for very heavy negative shape parameters (e.g., ξ = –1.5).

Here is what I found, which may be useful for other users struggling with the GEV:

1. The core issue was not the log-pdf itself, but the geometry

With ξ < 0 the GEV has a parameter-dependent upper endpoint

y < \mu - \frac{\sigma}{\xi},

which produces extremely sharp curvature in the posterior. A naive (mu, sigma, xi) parametrization (even with data standardization and non-centering) consistently led to 90–97% divergences for ξ ≈ –1.5.

2. brms avoids this problem with two key design choices

After studying the generated Stan code of brms, two things stood out:

No reject() calls in the log-pdf outside the support (my bad using it in my first attempts). Out-of-support evaluations simply return -inf, instead of triggering a discontinuity in the Hamiltonian.
A dedicated scaling function scale_xi()
This maps an unconstrained tmp_xi to a valid range for xi, computed from the data. This keeps the sampler away from pathological regions and dramatically improves geometry.

3. Once I implemented these two brms features, Stan sampling became completely stable

I built a minimal model that preserves the semantics of the GEV but uses:

safe versions of the GEV log-pdf and log-cdf,
the brms-style scale_xi() (with extra numerical stability),
no rejections in the lpdf.

Here are the posterior results for data generated with
μ = 10, σ = 3, ξ = –1.5, N = 2000:

mu      =  9.83   (true: 10)
sigma   =  3.15   (true: 3)
xi      = -1.45   (true: -1.5)

Rhat    = 1.00
ESS     = very high (2000–5000)
Divergences = 0
BFMI    = OK

This matches what brms produces.

4. Conclusion

The “brms parametrization” (no rejects + xi-scaling) is the crucial ingredient to stabilize NUTS for strongly negative ξ. The raw GEV parameterization is simply too geometrically difficult for HMC unless this reparameterization is used.

Happy to share the final Stan code if anyone is interested.

maxgotts · December 3, 2025, 10:26pm

I would love to see the Stan code for this, if you have it easily accessible! Thank you!

fabio · December 5, 2025, 11:25am

For now I share you the code I wrote so far that encopass \xi \le 0. I am still working for GEV with positive \xi.

real gev_lpdf(real y, real mu, real sigma, real xi) {
  real x = (y - mu) / sigma;
  if (xi == 0) {
    // Gumbel
    return -log(sigma) - x - exp(-x);
  }
  
  real t = 1 + xi * x;
  if (t <= 0) {
    // outside support -> logpdf = -inf
    return negative_infinity();
  }
  
  real inv_xi = inv(xi);
  return -log(sigma) - (1 + inv_xi) * log(t) - pow(t, -inv_xi);
}

real scale_xi(real xi_raw, vector y, real mu, real sigma) {
  int N = rows(y);
  vector[N] x = (y - mu) / sigma;

  real xmin = min(x);
  real xmax = max(x);

  real lb = -inv(xmax);
  real ub = -inv(xmin);

  return lb + inv_logit(xi_raw) * (ub - lb);
}

so, in the model GEV can be stated as:

  for (n in 1:N) {
    target += gev_lpdf(y[n] | mu, sigma, xi);
  }

So stay tune, more code to come in the next days.

maxgotts · December 6, 2025, 1:23am

This is really valuable, thank you for sharing!

fabio · December 9, 2025, 8:40am

Final summary (very short)

After extensive experimentation, the main insight is that fitting a GEV in Stan is primarily a geometry problem, not a log-pdf problem.

The posterior becomes unstable because the support depends jointly on (mu, sigma, xi).
This creates strong curvature, especially when xi is far from 0.

The two changes that make the model stable:

1. Standardize the data inside Stan

This keeps (y - mu)/sigma in a well-behaved range.
Without this, the admissible bounds for xi (as computed by scale_xi()) collapse when mu_init is far from the data, causing log(0) and chain failures.

2. Use a minimal parametrization on the standardized scale

mu_raw, sigma_raw = log(sigma_std), tmp_xi
→ then compute xi = scale_xi(...).

This eliminates unnecessary dependencies and results in stable inference for both positive and negative xi, including |xi| > 1.

Bonus

Running Pathfinder → NUTS provides robust initialization and a good metric, further reducing divergences.

In short:
Standardize → minimal params → scale_xi on standardized data → Pathfinder → NUTS.
This combination works reliably for the full GEV family.

Topic		Replies	Views
Aberrant behavior in dynamically bounded parameters, reparameterized GEV distribution Modeling rstan , stan-math	5	984	May 3, 2022
GEV Not converging Modeling	3	1033	August 21, 2019
Reparameterizing GEV Distribution Modeling fitting-issues	6	2791	December 7, 2017
Too more divergent transitions and too large R-hat in bayesian hierarchical generalized extreme value model Modeling rstan	8	266	May 5, 2025
Initial value error (log(0)) for user defined beta prior distribution in hierarchical generalized extreme value (GEV) distribution Modeling rstan , initialization	6	493	January 10, 2023

Generalized Extreme Value distribution

Final summary (very short)

1. Standardize the data inside Stan

2. Use a minimal parametrization on the standardized scale

Bonus

Related topics