Is it possible to build a multi-level extreme value analysis?

I’ve built an extreme value analysis model using the “point over threshold” method, which utilizes a generalized pareto distribution. It’s similar to @avehtari’s case study, except I am using the POT package in R, which uses maximum likelihood to estimate model parameters.

More specifically to my research, I’m studying 22 different buildings and looking at energy use in each of them under extreme weather conditions. So I’ve actually built 22 different models, estimated 22 different sets of parameters, etc. (The CDF for all 22 models is below, with the dark line representing the mean of the parameters. It is not multi-level.)

Rplot01

I’m wondering whether it’s possible to do something very similar to Aki’s model, but to do so in a multi-level framework, thereby estimating hyper-parameters and having a single “average” estimate of parameters.

Is this possible? Could I build off of Aki’s code? If someone says it’s possible, I’d like to take it on in this thread and work through it with y’all.

Many thanks.

2 Likes

Yes, it’s possible. Couple student groups have done this as part of their project in my Bayesian data analysis course at DTU. Sorry, I don’t have their code and I can’t remember the names of the students, but making the model is feasible.

1 Like

Hi, I’ve fit the generalized Pareto PDF with a multi-level structure. I’ve included the Stan code and data. Note the data has a ragged structure because there are varying observations per group. There are some divergences but everything converges nicely. Data and r script included.

functions {
  real gpareto_lpdf(vector y, real ymin, real k, real sigma, real s) {
    // generalised Pareto log pdf 
    real N = s;
    real inv_k = inv(k);
    if (k<0 && max(y-ymin)/sigma > -inv_k)
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma)
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return -(1+inv_k)*sum(log1p((y-ymin) * (k/sigma))) -N*log(sigma);
    else
      return -sum(y-ymin)/sigma -N*log(sigma); // limit k->0
      }
}
data {
	//threshold
  real<lower=0> ymin;
   // number of observations
  int<lower=1> N;
  // number of groups
  int<lower=-1> K;
  //observations
  vector<lower=ymin>[N] y;
  // group sizes 
  int<lower=0> s[K];
  // ymax
  vector<lower=ymin>[K] ymax;
}

parameters {
//overall mean
real k_mu;
//overall mean's sd
real<lower=0> k_sigma;
//sigma mu
real sigma_mu;
//overall sd
real<lower=0>sigma_sigma;
//y-specific
vector<lower=0>[K] sigma; 
vector<lower=-sigma[K]/(ymax[K]-ymin)>[K] k; 

}
model {
	int pos;
	pos = 1;
	
  //priors
  k_mu ~ normal(0, 3);
  k_sigma ~ normal(0, 3);
  sigma_mu ~ normal(0,3);
  sigma_sigma ~ normal(0, 3);
  k ~ normal(k_mu, k_sigma);
  sigma ~ normal(sigma_mu, sigma_sigma);
  
  for(i in 1:K){
  segment(y, pos, s[i]) ~ gpareto(ymin, k[i], sigma[i], s[i]);
  pos = pos + s[i];
  }
}

rstan script.R (1.3 KB)
gpareto.stan (1.4 KB)
data (ccl.csv) (740.9 KB)

3 Likes