Aberrant behavior in dynamically bounded parameters, reparameterized GEV distribution

The support of the GEV distribution changes dependent on the value of the shape parameter:

\xi > 0: \hspace{1cm} y \in \left[\mu - \frac{\sigma}{\xi}, +\infty\right) \hspace{0.25cm} \implies \hspace{0.25cm} \text{min(y)} \geq \mu - \frac{\sigma}{\xi}

\xi = 0: \hspace{1cm} y \in (-\infty, +\infty)

\xi < 0: \hspace{1cm} y \in \left(-\infty, \mu - \frac{\sigma}{\xi}\right] \hspace{0.25cm} \implies \hspace{0.25cm} \text{max(y)} \leq \mu - \frac{\sigma}{\xi}

The classic problem when implementing the GEV in Stan is how to account for this change in support without placing conditional bounds on the location parameter, which can introduce discontinuities when the shape parameter = 0.

Following Aki Vehatri’s work with the generalized pareto distribution, we reparameterize the GEV using the median. This allows us to place bounds directly on the shape parameter and avoid conditional statements in the bounds. This works quite well for a narrow range of parameter values but breaks down in odd ways for others, although Stan never diverges.

The problem is the errors Stan displays are not consistent with its behavior. A small case study of this is shown here: https://rpubs.com/dbarna/774644

Clearly the combination of a dynamically bounded parameter and the Lambert W function in the bounds of the shape parameter create some numerically challenging situations but I would be curious if anyone has an explanation for this odd behavior in Stan.

Model printed below and a full explanation of the scenario with simulated data can be found in the rpubs link above.

functions{
  
  ## a function that returns the log density of the GEV
  real gev_lpdf(vector y, real mu, real sigma, real xi){
    
    int N = rows(y);
    real min_y = min(y);
    real max_y = max(y);
    vector[N] lpdf;
    vector[N] b;
    vector[N] a;

    a = (y - mu) / sigma;
    b = 1 + a * xi;

    ## print functions to check if we're going outside the support
    if (xi > 0 && min_y < mu - sigma/xi){
      print("xi>0 and min(y)<mu-sigma/xi outside support");
    }
    if (xi < 0 && max_y > mu - sigma/xi){
      print("xi<0 and max(y)>mu-sigma/xi outside support");
    }
    if (sigma<=0)
      print("sigma<=0; found sigma =", sigma);

    if (fabs(xi) > 1e-15){ ##if xi =/= 0
      for(i in 1:N)
      lpdf[i] = -log(sigma) - 
                (1 + 1/xi)*log(b[i]) - pow(b[i],(-1/xi));
    } else{
      for(i in 1:N)
      lpdf[i] = -log(sigma) - a[i] - exp(-a[i]);
    }
    return sum(lpdf);
  }
  
  ## function for the median
  real median(vector y){
    int N = rows(y);
    vector[N] sorted_vec = sort_asc(y);
    real val;
    
    if (N%2 == 0){ ##is even numbered set
      val = (sorted_vec[N/2] + sorted_vec[N/2+1])/2;
    } else{ ##is odd numbered set
      val = sorted_vec[N/2+1];
    }
    return val;
  }
}

data {
  int<lower=0> N;
  vector[N] y;
}

transformed data {
  real q_ind = median(y);
  real y_max = max(y);
  real a = q_ind/(y_max - q_ind);
  real L = log(log(2));
}

parameters {
  real<upper=log( -1 / (a*L*e()) )> beta;
  real<lower=lambert_w0(-a*exp(beta)*L)/L,
       upper=lambert_w0(a*exp(beta)*L)/L> xi;
}

transformed parameters {
  real sigma = q_ind*exp(beta);
  real mu = fabs(xi) > 1e-15 ? 
            q_ind - sigma*(pow(log(2),-xi)-1)/xi : 
            q_ind + sigma*log(log(2));
}

model {
  
  target += normal_lpdf(beta | -2.17, 2); #prior for beta
  target += beta_lpdf(-xi + 0.5 | 6, 9); #prior for xi
  
  target += gev_lpdf(y | mu, sigma, xi);
}
1 Like

Hi, sorry for not getting to you earlier. This is an excellent - although likely hard - question.

As a general rule, it is better for diagnostics to look at the parameters on the unconstrained scale. This can be achieved either by using the diagnostic file (cleanest, but somewhat cumbersome), or by manually applying the inverse transfrom in generated quantities or by actually performing the transform yourself (and adding the necessary Jacobians).

What I think happens in the xi = 0.1 case is that the sharp boundary is actually a consequence of the enforced bounds and the unconstrained space would not have it.

Other thing that looks weird is that you are only using max(y), but the bound can act on both directions, so why does not min(y) appear anywhere in the model?

A possible line of attack would be to check if you can fit the model reliably if you already know the sign of xi. If yes, then one could compose the full model by treating the sign as a discrete parameter to be marginalized out. I’d be happy to provide more pointers, but it can be a bit annoying and would be relevant only if the model with known sign is well-behaved.

Best of luck!

1 Like

Hi, apologies for the late reply–you caught me in the middle of summer vacation.

Thanks very much for a well thought-out answer, this gives us a way forward. Marginalizing out the sign of xi could be particularly useful for our application. A good suggestion!

Hopefully these paths forward are fruitful, I’ll update the post if any progress is made.

1 Like

I just saw this on Twitter and it looks highly relevant: [2106.13110] Practical strategies for GEV-based regression models for extremes

2 Likes

Thank you for this! It is highly relevant.

We ended up coming back to this problem nearly a year later and working out a solution, so thought I’d post here just in case it’s relevant to someone.

The support of the GEV depends on all three parameters and the data:

\begin{equation} \{ y : 1 + \xi(y-\mu)/\sigma > 0 \} \end{equation}

where -\infty < \mu < \infty, \sigma > 0, and -\infty < \xi < \infty. The trick to dealing with this is to use a quantile-based reparameterization of the distribution such that we can place additional assumptions on the parameters.

This is what we were trying last year, but in my previous post we made the incorrect assumption that the bounds on the \xi parameter would be symmetric and so ended up with mysterious boundaries and wacky Stan behavior.

However:

In my particular application (statistical modeling of flood events) we can assume (1) all data points are positive (2) \xi lives within the range (-0.5,0.5). Then a quantile based reparameterization allows us to further assume:

\begin{equation} 0 \leq y_{min} < \eta < y_{max} \end{equation}

where \eta is, for example, the median.

Using these three assumptions we can work out bounds on \xi and \beta.

The Stan model would then be (requires Stan version 2.25 or greater):

functions {
  real IF_gev_lpdf(vector y, real mu, real sigma, real xi){
      int N = rows(y);
      vector[N] lpdf;
      vector[N] b;
      vector[N] a;
      
      // ------ likelihood --------
      a = (y - mu) / sigma;
      b = 1 + a * xi;
      if(fabs(xi) > 1e-15){ //if xi =/= 0
      for(i in 1:N)
        lpdf[i] = -log(sigma) - (1 + 1/xi)*log(b[i]) - pow(b[i],(-1/xi));
      } else{
      for(i in 1:N)
        lpdf[i] = -log(sigma) - a[i] - exp(-a[i]);
      }
      
      // check support
      if (sigma<=0)
        print("sigma<=0; found sigma =", sigma);
      if(xi*min((y-mu)/sigma) <= -1){
        print("outside support");
        print("found xi = ", xi);
        print("found mu = ", mu);
        print("found sigma = ", sigma);
        print("for lpdf = ", lpdf);
    }
  return sum(lpdf);
 }
}

data {
  int<lower=0> N;
  vector[N] y;
}

transformed data {
  real y_min = min(y);
  real y_max = max(y);
  real L = log(log(2));
}

parameters {
  real<lower = y_min, upper = y_max> qind;
  real<upper = log((y_min-qind)/(qind*exp(1)*L))> beta;
  real<lower=lambert_w0((qind/(qind-y_max))*exp(beta)*L)/L,
       upper=lambert_w0((qind/(qind-y_min))*exp(beta)*L)/L> xi; 
}

transformed parameters {
 real sigma = qind * exp(beta);
 real mu;
  if(fabs(xi) > 1e-15){
    mu = qind - sigma * ( pow(log(2),-xi) - 1 )/xi;
  } else{
    mu = qind + sigma * log(log(2));
  }
}

model {
  target += normal_lpdf(qind | 40, 100);
  target += normal_lpdf(beta | 0, 100);
  target += beta_lpdf(-xi + 0.5 | 6, 9);
  target += IF_gev_lpdf(y | mu, sigma, xi);
}

2 Likes