Reparameterizing GEV Distribution

I’m working on simulating many sequences streamflow from a “known” (synthetic) model and fitting it to a GEV distribution. Generally the model converges quite well, but sometimes it converges to a result where mu is very large and sigma very small (using notation from Wikipedia. When it does this, unsurprisingly, the model also takes a very long time to fit.

As a hack I can reduce this problem by putting more informative priors on my parameters (i.e. based on the posterior distribution from the sequences that fit well) but this is neither intellectually defensible nor very robust. I’d like to reparameterize the GEV to better address this but don’t have a good idea for a good reparameterization. If anyone has any resources they would like to point me towards, it would be much appreciated.

/*
Fit a GEV distribution in stan
Code is original but heavily based on previous work at:
http://bechtel.colorado.edu/~bracken/tutorials/stan/
http://discourse.mc-stan.org/t/troubles-in-rejecting-initial-value/1827
*/
functions{
  real gev_lpdf(vector y, real mu, real sigma, real xi) {
    vector[rows(y)] z;
    vector[rows(y)] lp;
    int N;
    N = rows(y);
    for(n in 1:N){
      z[n] = 1 + (y[n] - mu) * xi / sigma;
      lp[n] = (1 + (1 / xi)) * log(z[n]) + pow(z[n], -1/xi);
    }
    return -sum(lp) - N * log(sigma);
  }
}
data {
  int<lower=0> N;
  vector[N] y;
}

transformed data {
  real min_y;
  real max_y;
  real sd_y;
  min_y = min(y);
  max_y = max(y);
  sd_y = sd(y);
}
parameters {
  real<lower=-0.5,upper=0.5> xi;
  real<lower=0> sigma;
  // location has upper/lower bounds depending on the value of xi
  real<lower=if_else( xi > 0, min_y, negative_infinity()),
       upper=if_else( xi > 0, positive_infinity(), max_y )> mu;
}
model {
  // Priors -- these can be modified depending on your context
  real xi_plus_half;
  xi_plus_half = xi + 0.5;
  xi_plus_half ~ beta(3, 12); // somewhat informative: expectation -0.3 per Martins2000
  sigma ~ normal(0, 100); //
  mu ~ normal(0, 250); // 
  // Data Model
  y ~ gev(mu, sigma, xi);
}

To better visualize the problems I am seeing, consider these two images. To generate them, I created 100 synthetic data sets and then fit each in stan using the above model and 1000 posterior draws. The first shows the posterior distribution of mu and sigma (for all 100 * 1000 draws). For the second, I took the posterior mean of each parameter for each sequence (independently – this is just for illustration) and plotted the resulting generalized extreme value CDF function (scipy.stats.genextreme) for each of the 100 sequences. Apologies for the difficulty seeing, but there are two CDFs which are almost horizontal at around y=3000 and y=200 – corresponding to the dots in the first plot.

posterior_mu_sigmaposterior_cdf

An example of the data I would fit is (in python). The data is actually floats but this is not all printing to screen.

array([ 1447.,  1939.,  1100.,   720.,  1316.,   349.,   528.,   781.,   359.,
         508.,  1338.,   379.,   524.,   353.,   485.,   279.,   295.,   541.,
         345.,   484.,   490.,   230.,   265.,   242.,   300.,  1381.,  1493.,
         768.,   406.,   364.,   484.,   473.,   330.,   328.,   355.,   388.,
         814.,   617.,   453.,   465.,   258.,   896.,   873.,  1028.,   841.,
         916.,   358.,   652.,   753.,  1095.,   301.,   302.,   273.,   251.,
         561.,   999.,  1410.,   751.,   802.,   662.,  1324.,   923.,   669.,
        1046.,  1311.,   391.,   396.,   915.,   457.,   264.,   626.,   407.,
         411.,   724.,   441.,   503.,   428.,   359.,   539.,   405.,   727.,
         522.,   265.,   390.,   392.,   546.,   319.,   301.,   364.,   299.,
         555.,   714.,  1306.,   397.,   458.,   464.,   406.,   445.,   405.,
         318.])

N.B.: the above results were actually obtained using the informative (but not justifiable) priors discussed in the original post, rather than the priors shown in the stan code.

It’s unlikely that reparameterizing would help. Usually reparameterization is used if HMC/NUTS is unable to go somewhere. Now your problem is that HMC/NUTS is sometimes going where you think it shouldn’t. You need informative priors.

Instead of hack, you should use informative priors based on what you know. This would be intellectually defensible.

If you know that sigma shouldn’t be small, then use prior which has smaller density for small values (not the half-normal).

I haven’t read this in detail, but it might give you hints for how to choose prior for GEV [1712.00685] Bayesian prior elicitation and selection for extreme values

1 Like

Thanks, this is both useful and helps me avoid spending too much time with pencil & paper!

Managed to get something that works so I’ll post the answer in case anyone’s googling this in the future (though it obviously isn’t the “perfect” solution). Basically the key was to get rid of the bounds on mu and (probably more important) to put a prior on the ratio mu / sigma. For the models that were “blowing up” (see the dots in the picture above) the ratio mu/sigma was >2000 while for the other dots it was around 2.

functions{
  real gev_lpdf(vector y, real mu, real sigma, real xi) {
    vector[rows(y)] t;
    vector[rows(y)] lp;
    int N;
    N = rows(y);
    for(n in 1:N){
      t[n] = xi==0 ? exp((mu - y[n]) / sigma) : pow(1 + xi * ((y[n] - mu ) / sigma), -1/xi);
      lp[n] = -log(sigma) + (xi + 1) * log(t[n]) - t[n];
    }
    return sum(lp);
  }
}
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real xi;
  real<lower=0> sigma;
  real mu;
}
model {
// Priors -- these can be modified depending on your context
  real mu_sigma;
  mu_sigma = mu / sigma;
  mu_sigma ~ normal(2, 0.5); // prior on mu/sigma
  xi ~ normal(0, 0.125); // sorta-weakly informative
  // Data Model
  y ~ gev(mu, sigma, xi);
}
1 Like

This either needs a Jacobian adjustment or better declare mu_sigma in the parameters block and real mu = mu_sigma * sigma; in the model block.

2 Likes

Thanks. What’s the rationale behind putting mu_sigma in the parameters block and mu in the model block rather than vice-versa? Is it because mu_sigma and sigma are [we hope] almost uncorrelated which helps new sampler proposals?

It was mostly to avoid having to do a Jacobian correction to get the posterior distribution Stan draws from to be the posterior distribution you intended.

1 Like