Aberrant behavior in dynamically bounded parameters, reparameterized GEV distribution

The support of the GEV distribution changes dependent on the value of the shape parameter:

\xi > 0: \hspace{1cm} y \in \left[\mu - \frac{\sigma}{\xi}, +\infty\right) \hspace{0.25cm} \implies \hspace{0.25cm} \text{min(y)} \geq \mu - \frac{\sigma}{\xi}

\xi = 0: \hspace{1cm} y \in (-\infty, +\infty)

\xi < 0: \hspace{1cm} y \in \left(-\infty, \mu - \frac{\sigma}{\xi}\right] \hspace{0.25cm} \implies \hspace{0.25cm} \text{max(y)} \leq \mu - \frac{\sigma}{\xi}

The classic problem when implementing the GEV in Stan is how to account for this change in support without placing conditional bounds on the location parameter, which can introduce discontinuities when the shape parameter = 0.

Following Aki Vehatri’s work with the generalized pareto distribution, we reparameterize the GEV using the median. This allows us to place bounds directly on the shape parameter and avoid conditional statements in the bounds. This works quite well for a narrow range of parameter values but breaks down in odd ways for others, although Stan never diverges.

The problem is the errors Stan displays are not consistent with its behavior. A small case study of this is shown here: https://rpubs.com/dbarna/774644

Clearly the combination of a dynamically bounded parameter and the Lambert W function in the bounds of the shape parameter create some numerically challenging situations but I would be curious if anyone has an explanation for this odd behavior in Stan.

Model printed below and a full explanation of the scenario with simulated data can be found in the rpubs link above.

functions{
  
  ## a function that returns the log density of the GEV
  real gev_lpdf(vector y, real mu, real sigma, real xi){
    
    int N = rows(y);
    real min_y = min(y);
    real max_y = max(y);
    vector[N] lpdf;
    vector[N] b;
    vector[N] a;

    a = (y - mu) / sigma;
    b = 1 + a * xi;

    ## print functions to check if we're going outside the support
    if (xi > 0 && min_y < mu - sigma/xi){
      print("xi>0 and min(y)<mu-sigma/xi outside support");
    }
    if (xi < 0 && max_y > mu - sigma/xi){
      print("xi<0 and max(y)>mu-sigma/xi outside support");
    }
    if (sigma<=0)
      print("sigma<=0; found sigma =", sigma);

    if (fabs(xi) > 1e-15){ ##if xi =/= 0
      for(i in 1:N)
      lpdf[i] = -log(sigma) - 
                (1 + 1/xi)*log(b[i]) - pow(b[i],(-1/xi));
    } else{
      for(i in 1:N)
      lpdf[i] = -log(sigma) - a[i] - exp(-a[i]);
    }
    return sum(lpdf);
  }
  
  ## function for the median
  real median(vector y){
    int N = rows(y);
    vector[N] sorted_vec = sort_asc(y);
    real val;
    
    if (N%2 == 0){ ##is even numbered set
      val = (sorted_vec[N/2] + sorted_vec[N/2+1])/2;
    } else{ ##is odd numbered set
      val = sorted_vec[N/2+1];
    }
    return val;
  }
}

data {
  int<lower=0> N;
  vector[N] y;
}

transformed data {
  real q_ind = median(y);
  real y_max = max(y);
  real a = q_ind/(y_max - q_ind);
  real L = log(log(2));
}

parameters {
  real<upper=log( -1 / (a*L*e()) )> beta;
  real<lower=lambert_w0(-a*exp(beta)*L)/L,
       upper=lambert_w0(a*exp(beta)*L)/L> xi;
}

transformed parameters {
  real sigma = q_ind*exp(beta);
  real mu = fabs(xi) > 1e-15 ? 
            q_ind - sigma*(pow(log(2),-xi)-1)/xi : 
            q_ind + sigma*log(log(2));
}

model {
  
  target += normal_lpdf(beta | -2.17, 2); #prior for beta
  target += beta_lpdf(-xi + 0.5 | 6, 9); #prior for xi
  
  target += gev_lpdf(y | mu, sigma, xi);
}
1 Like

Hi, sorry for not getting to you earlier. This is an excellent - although likely hard - question.

As a general rule, it is better for diagnostics to look at the parameters on the unconstrained scale. This can be achieved either by using the diagnostic file (cleanest, but somewhat cumbersome), or by manually applying the inverse transfrom in generated quantities or by actually performing the transform yourself (and adding the necessary Jacobians).

What I think happens in the xi = 0.1 case is that the sharp boundary is actually a consequence of the enforced bounds and the unconstrained space would not have it.

Other thing that looks weird is that you are only using max(y), but the bound can act on both directions, so why does not min(y) appear anywhere in the model?

A possible line of attack would be to check if you can fit the model reliably if you already know the sign of xi. If yes, then one could compose the full model by treating the sign as a discrete parameter to be marginalized out. I’d be happy to provide more pointers, but it can be a bit annoying and would be relevant only if the model with known sign is well-behaved.

Best of luck!

1 Like

Hi, apologies for the late reply–you caught me in the middle of summer vacation.

Thanks very much for a well thought-out answer, this gives us a way forward. Marginalizing out the sign of xi could be particularly useful for our application. A good suggestion!

Hopefully these paths forward are fruitful, I’ll update the post if any progress is made.

1 Like