Too more divergent transitions and too large R-hat in bayesian hierarchical generalized extreme value model

jing · March 17, 2025, 4:39am

Here is my Stan code.

functions{
  real gev_lpdf(real y,real mu,real sigma,real xi){
    real z;
    z=1+xi*(y-mu)/sigma;
    if (xi!=0){
      if (z<=0){
        return negative_infinity();//超出支持域，返回负无穷
      }else{
        return -log(sigma)-(1+1/xi)*log(z)-pow(z,-1/xi);
      }
    }else{
      //当xi=0时，退化为Gumbel分布
      return -log(sigma)-(y-mu)/sigma-exp(-(y-mu)/sigma);
    }
  }
}


data {
  int<lower=0> N; //总样本量=305
  int<lower=0> J; //地点数量=5
  int<lower=1,upper=J> location[N];
  int<lower=0> K_mu;
  //int<lower=0> K_sigma;
  matrix[N,K_mu] X_mu;
  //matrix[N,K_sigma] X_sigma;
  real y[N];
}


parameters {
  //位置参数
  real alpha_mu_0;
  real epsilon_mu[J];
  vector[K_mu] alpha_mu;
  
  //尺度参数
  real alpha_sigma_0;
  real epsilon_sigma[J];
  //vector[K_sigma] alpha_sigma;
  
  //形状参数
  real alpha_xi_0;
  real epsilon_xi[J];
  
  //先验层超参数
  real<lower=0> sigma_epsilon_mu; //epsilon_mu的标准差
  real<lower=0> sigma_epsilon_sigma; //epsilon_sigma的标准差
  real<lower=0> sigma_epsilon_xi; //epsilon_xi的标准差
}

transformed parameters {
  real<lower=0> sigma2_epsilon_mu; //计算方差的倒数，即精度
  sigma2_epsilon_mu=1/square(sigma_epsilon_mu);
  real<lower=0> sigma2_epsilon_sigma;
  sigma2_epsilon_sigma=1/square(sigma_epsilon_sigma);
  real<lower=0> sigma2_epsilon_xi;
  sigma2_epsilon_xi=1/square(sigma_epsilon_xi);
}

model {
  //地点层级的随机参数先验
  epsilon_mu ~ normal(0,sigma_epsilon_mu);
  epsilon_sigma ~ normal(0,sigma_epsilon_sigma);
  epsilon_xi ~ normal(0,sigma_epsilon_xi);
  sigma2_epsilon_mu ~ gamma(0.01,0.01);
  sigma2_epsilon_sigma ~ gamma(0.01,0.01);
  sigma2_epsilon_xi ~ gamma(0.01,0.01);
  
  //常数项的弱信息先验
  alpha_mu_0 ~ normal(0,1000);
  alpha_sigma_0 ~ normal(0,1000);
  alpha_xi_0 ~ uniform(-1.0,1.0);
  
  //协变量系数的弱信息先验
  alpha_mu ~ normal(0,1000);
  //alpha_sigma ~ normal(0,1000);
  
  //似然函数（GEV分布）
  for (i in 1:N) {
    real mu = alpha_mu_0 + X_mu[i]*alpha_mu  +epsilon_mu[location[i]];
    real log_sigma = alpha_sigma_0 + epsilon_sigma[location[i]];
    real sigma = exp(log_sigma);
    real xi = alpha_xi_0 + epsilon_xi[location[i]];
    target += gev_lpdf(y[i] | mu,sigma,xi);
  }
}

Rstan provided a warning message:

My model clearly hasn’t converged,what could be the problem? If anyone could provide me with some help, I would be very grateful!

avehtari · March 17, 2025, 5:52pm

You have defined alpha_xi_0 without constraints

and then assign a prior with compact support

This is really bad for HMC. It would be better to define
real<lower=-1,upper=1> alpha_xi_0;
and drop the uniform prior

jing · March 19, 2025, 2:38am

Thank you very much for your reply! I have defined alpha_xi_0 with constraints, but the Rhat value is still not ideal. Here are my revised Stan code and R code, could you please help me take a look again?
stan code

functions{
  real gev_lpdf(real y,real mu,real sigma,real xi){
    real z;
    z=1+xi*(y-mu)/sigma;
    if (xi!=0){
      if (z<=0){
        return negative_infinity();//超出支持域，返回负无穷
      }else{
        return -log(sigma)-(1+1/xi)*log(z)-pow(z,-1/xi);
      }
    }else{
      //当xi=0时，退化为Gumbel分布
      return -log(sigma)-(y-mu)/sigma-exp(-(y-mu)/sigma);
    }
  }
}


data {
  int<lower=0> N; //总样本量=305
  int<lower=0> J; //地点数量=5
  array[N] int<lower=1,upper=J> location;
  int<lower=0> K_mu;
  //int<lower=0> K_sigma;
  matrix[N,K_mu] X_mu;
  //matrix[N,K_sigma] X_sigma;
  array[N] real y;
}


parameters {
  //位置参数
  real alpha_mu_0;
  array[J] real epsilon_mu;
  vector[K_mu] alpha_mu;
  
  //尺度参数
  real alpha_sigma_0;
  array[J] real epsilon_sigma;
  //vector[K_sigma] alpha_sigma;
  
  //形状参数
  real<lower=-1,upper=1> alpha_xi_0;
  array[J] real epsilon_xi;
  
  //先验层超参数
  real<lower=0> sigma2_epsilon_mu; //epsilon_mu的精度
  real<lower=0> sigma2_epsilon_sigma; //epsilon_sigma的精度
  real<lower=0> sigma2_epsilon_xi; //epsilon_xi的精度
}

transformed parameters {
  real<lower=0> sigma_epsilon_mu; //计算标准差
  sigma_epsilon_mu=sqrt(1/sigma2_epsilon_mu);
  real<lower=0> sigma_epsilon_sigma;
  sigma_epsilon_sigma=sqrt(1/sigma2_epsilon_sigma);
  real<lower=0> sigma_epsilon_xi;
  sigma_epsilon_xi=sqrt(1/sigma2_epsilon_xi);
}

model {
  //地点层级的随机参数先验
  epsilon_mu ~ normal(0,sigma_epsilon_mu);
  epsilon_sigma ~ normal(0,sigma_epsilon_sigma);
  epsilon_xi ~ normal(0,sigma_epsilon_xi);
  sigma2_epsilon_mu ~ gamma(0.01,0.01);
  sigma2_epsilon_sigma ~ gamma(0.01,0.01);
  sigma2_epsilon_xi ~ gamma(0.01,0.01);
  
  //常数项的弱信息先验
  alpha_mu_0 ~ normal(0,1000);
  alpha_sigma_0 ~ normal(0,1000);
  
  //协变量系数的弱信息先验
  alpha_mu ~ normal(0,1000);
  //alpha_sigma ~ normal(0,1000);
  
  //似然函数（GEV分布）
  for (i in 1:N) {
    real mu = alpha_mu_0 + X_mu[i]*alpha_mu  +epsilon_mu[location[i]];
    real log_sigma = alpha_sigma_0 + epsilon_sigma[location[i]];
    real sigma = exp(log_sigma);
    real xi = alpha_xi_0 + epsilon_xi[location[i]];
    target += gev_lpdf(y[i] | mu,sigma,xi);
  }
}

R code

#数据预处理
data_list <- list(
  N=305,
  J=5,
  location= data$location,
  K_mu=3,
  X_mu=as.matrix(data[,c('num_rear_end_conflict','pov','average_v')]),
  y=data$y
)

#设置初始值
init_list<-list(
  list(alpha_mu_0=0.05,
       alpha_mu=c(0,0,0),
       alpha_sigma_0=0.07,
       alpha_xi_0=0.5,
       sigma2_epsilon_mu=1,
       sigma2_epsilon_sigma=1,
       sigma2_epsilon_xi=1,
       epsilon_mu=c(0,0,0,0,0),
       epsilon_sigma=c(0,0,0,0,0),
       epsilon_xi=c(2,2,2,2,2)),
  list(alpha_mu_0=0.04,
       alpha_mu=c(0,0,0),
       alpha_sigma_0=0.06,
       alpha_xi_0=0.8,
       sigma2_epsilon_mu=4,
       sigma2_epsilon_sigma=4,
       sigma2_epsilon_xi=4,
       epsilon_mu=c(0.01,0.01,0.01,0.01,0.01),
       epsilon_sigma=c(0.01,0.01,0.01,0.01,0.01),
       epsilon_xi=c(1.6,1.6,1.6,1.6,1.6))
)



#编译并运行模型
mod<-cmdstan_model("C:/Users/Administrator/Desktop/Rstan代码/BHGEV(1,0,0).stan")
fit <- mod$sample(
  data=data_list,
  chains=2,
  parallel_chains=2,
  #opencl_ids=c(0,0),
  init = init_list,
  iter_sampling=10000,
  iter_warmup=10000,
  seed=123,
  thin=1,
  #threads_per_chain=2,
  sig_figs = 6,
  refresh=100,
  adapt_delta=0.99,
  max_treedepth = 15) #确定抽样方法，提高HMC稳定性

And below are my parameter estimation results:

However, the divergence situation seems to have improved after the correction.But it warns that 19907 of 20000 transitions hit the maximum treedepth limit of 15, should I continue to increase the maximum treedepth?

Thank you very much!

avehtari · March 20, 2025, 12:29pm

As the next step it would be good to check that your gev_lpdf implementation is correct and numerically stable.

This is theoretically correct, but due to the limitations of floating point presentation you need to do something else. Here’s a plot with y=1, mu=0, sigma=1, and varying xi, using your current code

The computation fails with small but non-zero xi, so this should be something like

    if (abs(xi)>1e-12){

After changing this, you can test that the implementation otherwise is correct by following the case study Extreme value analysis and user defined probability functions in Stan

It is possible that your implementation is correct, but the posterior is multimodal which is likely in case of very thick tailed distribution and Rhats bigger than 2 indicates that you. You could look at the pairs plot (see Scatterplots of MCMC draws — MCMC-scatterplots • bayesplot).

jing · March 24, 2025, 6:52am

Hello! I have carefully read the article Extreme value analysis and user defined probability functions in Stan.Because I want to study the influence of some covariates on the parameters(like mu, sigma) of the GEV distribution, for the sake of convenience, I have now removed all covariates and only retained the constant and residual terms. Among them, the residual terms are different at different locations and the same at the same location.
Based on your article, I revised my gev_lpdf.

functions{
  real gev_lpdf(real y,real mu,real sigma,real xi){
    if (xi<0 && (y-mu)/sigma>= -1/xi){
      reject("xi<0 and (y-mu)/sigma>= -1/xi;found xi,sigma,mu=",xi,",",sigma,",",mu);
      //return negative_infinity();
    }
    if (xi>0 && (y-mu)/sigma<= -1/xi){
      reject("xi>0 and (y-mu)/sigma<= -1/xi;found xi,sigma,mu=",xi,",",sigma,",",mu);
      //return negative_infinity();
    }
    if (sigma<=0){
      reject("sigma<=0;found sigma=",sigma);
    }
    if (abs(xi)>1e-12){
        return -log(sigma)-(1+1/xi)*log(1+xi*(y-mu)/sigma)-pow(1+xi*(y-mu)/sigma,-1/xi);
    }else{
      //当xi=0时，退化为Gumbel分布
      return -log(sigma)-(y-mu)/sigma-exp(-(y-mu)/sigma);
    }
  }
}

After running, I found that a large number of samples were rejected. I suddenly realized that it was likely due to my lack of constraints on xi(=alpha_xi_0+epsilon_xi) that many samples did not meet the conditions and were rejected, resulting in a low ESS and high R-hat.
So I calculated the lower limit of xi and constrained it. Because the lower limit value of xi is equal to sigma/(mu-y) assuming xi is less than 0, and mu and sigma vary with covariates, the lower limit value also varies.Therefore, I calculate the lower limit value in the transformed parameters module, and xi can only be defined in this module.

  **transformed parameters**
  real<lower=lower1,upper=0> xi_1=alpha_xi_1+epsilon_xi[1];

**model**
alpha_xi_1~uniform(lower1+1,0);
epsilon_xi ~ normal(0,sigma_epsilon_xi);

I know that uniform doesn’t work well in Stan, but I can only do this to constrain the constant term within this range. In order to reduce the accuracy error caused by calculation, I added 1 to lower.
After doing these things, I was puzzled why the value of xi still exceeded the range and was still rejected in large numbers.

And you can see that they were all rejected with only a little difference from the lower limit value. I do not understand why this is happening.
I am looking forward to your reply. Thank you very much!

avehtari · March 26, 2025, 5:26pm

Based on your gev_lpdf, xi can be also larger than 0, so why do you have constrain upper=0?. It seems your xi is the same as k in my case study. In my case study I have

  real<lower=-sigma / (ymax - ymin)> k;

and there is no need for upper constraint.
Additionally, the _lpdf has

    real inv_k = inv(k);
    if (k < 0 && max(y - ymin) / sigma > -inv_k) {
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma);
    }

that is we need to check the constrain only if k<0 (and we should not reach this branch ever if we have that lower constrain in parameters block)’

You need to define the constrain for the parameter in the parameters block in order to Stan use a transformation behind the scene that maps the constrained parameter to an unconstrained parameter. If you add the constrain in the transformed parameter, you just add sharp edge in the posterior in the same way as the uniform prior.

I don’t understand why you have alpha_xi_1+epsilon_xi

danibarna · May 2, 2025, 9:48am

Hi, I think this discussion and Stan code for the GEV may be useful to you.

We discuss the implementation in a little more detail in this Rpubs and this brms issue report.

Broadly speaking, the trick, as @avehtari made elegant use of in the case study with the generalized pareto, is to reparameterize the distribution such that support constraints can be forced onto the shape parameter. This allows the bounds to be expressed as a continuously differentiable function of the other parameters (and the data). The quantile-based reparameterization used by Castro-Camilo et al., 2022 is particularly useful for the GEV.

Hope this is helpful

jing · May 5, 2025, 7:03am

Thank you very much for your reply! I have already known how to solve it. I forget to reply because I haven’t logged in for a long time. Many thanks for your help!

jing · May 5, 2025, 7:04am

Thank you for your answer!

Topic		Replies	Views
Divergent Transitions with R_hat < 1.1 Modeling fitting-issues , performance , shinystan	14	2372	June 13, 2019
Divergent transitions with hierarchical model Modeling	5	733	July 15, 2019
Help with all iterations ended with a divergence and high R-hat Modeling rstan	5	766	June 23, 2020
Persistent divergent transitions in simple model Modeling fitting-issues	5	541	April 17, 2020
Divergence and treedepth issues in multilevel threshold autoregressive model estimation Modeling	2	382	April 10, 2023

Too more divergent transitions and too large R-hat in bayesian hierarchical generalized extreme value model

Related topics