Too more divergent transitions and too large R-hat in bayesian hierarchical generalized extreme value model

Thank you very much for your reply! I have defined alpha_xi_0 with constraints, but the Rhat value is still not ideal. Here are my revised Stan code and R code, could you please help me take a look again?
stan code

functions{
  real gev_lpdf(real y,real mu,real sigma,real xi){
    real z;
    z=1+xi*(y-mu)/sigma;
    if (xi!=0){
      if (z<=0){
        return negative_infinity();//超出支持域,返回负无穷
      }else{
        return -log(sigma)-(1+1/xi)*log(z)-pow(z,-1/xi);
      }
    }else{
      //当xi=0时,退化为Gumbel分布
      return -log(sigma)-(y-mu)/sigma-exp(-(y-mu)/sigma);
    }
  }
}


data {
  int<lower=0> N; //总样本量=305
  int<lower=0> J; //地点数量=5
  array[N] int<lower=1,upper=J> location;
  int<lower=0> K_mu;
  //int<lower=0> K_sigma;
  matrix[N,K_mu] X_mu;
  //matrix[N,K_sigma] X_sigma;
  array[N] real y;
}


parameters {
  //位置参数
  real alpha_mu_0;
  array[J] real epsilon_mu;
  vector[K_mu] alpha_mu;
  
  //尺度参数
  real alpha_sigma_0;
  array[J] real epsilon_sigma;
  //vector[K_sigma] alpha_sigma;
  
  //形状参数
  real<lower=-1,upper=1> alpha_xi_0;
  array[J] real epsilon_xi;
  
  //先验层超参数
  real<lower=0> sigma2_epsilon_mu; //epsilon_mu的精度
  real<lower=0> sigma2_epsilon_sigma; //epsilon_sigma的精度
  real<lower=0> sigma2_epsilon_xi; //epsilon_xi的精度
}

transformed parameters {
  real<lower=0> sigma_epsilon_mu; //计算标准差
  sigma_epsilon_mu=sqrt(1/sigma2_epsilon_mu);
  real<lower=0> sigma_epsilon_sigma;
  sigma_epsilon_sigma=sqrt(1/sigma2_epsilon_sigma);
  real<lower=0> sigma_epsilon_xi;
  sigma_epsilon_xi=sqrt(1/sigma2_epsilon_xi);
}

model {
  //地点层级的随机参数先验
  epsilon_mu ~ normal(0,sigma_epsilon_mu);
  epsilon_sigma ~ normal(0,sigma_epsilon_sigma);
  epsilon_xi ~ normal(0,sigma_epsilon_xi);
  sigma2_epsilon_mu ~ gamma(0.01,0.01);
  sigma2_epsilon_sigma ~ gamma(0.01,0.01);
  sigma2_epsilon_xi ~ gamma(0.01,0.01);
  
  //常数项的弱信息先验
  alpha_mu_0 ~ normal(0,1000);
  alpha_sigma_0 ~ normal(0,1000);
  
  //协变量系数的弱信息先验
  alpha_mu ~ normal(0,1000);
  //alpha_sigma ~ normal(0,1000);
  
  //似然函数(GEV分布)
  for (i in 1:N) {
    real mu = alpha_mu_0 + X_mu[i]*alpha_mu  +epsilon_mu[location[i]];
    real log_sigma = alpha_sigma_0 + epsilon_sigma[location[i]];
    real sigma = exp(log_sigma);
    real xi = alpha_xi_0 + epsilon_xi[location[i]];
    target += gev_lpdf(y[i] | mu,sigma,xi);
  }
}

R code

#数据预处理
data_list <- list(
  N=305,
  J=5,
  location= data$location,
  K_mu=3,
  X_mu=as.matrix(data[,c('num_rear_end_conflict','pov','average_v')]),
  y=data$y
)

#设置初始值
init_list<-list(
  list(alpha_mu_0=0.05,
       alpha_mu=c(0,0,0),
       alpha_sigma_0=0.07,
       alpha_xi_0=0.5,
       sigma2_epsilon_mu=1,
       sigma2_epsilon_sigma=1,
       sigma2_epsilon_xi=1,
       epsilon_mu=c(0,0,0,0,0),
       epsilon_sigma=c(0,0,0,0,0),
       epsilon_xi=c(2,2,2,2,2)),
  list(alpha_mu_0=0.04,
       alpha_mu=c(0,0,0),
       alpha_sigma_0=0.06,
       alpha_xi_0=0.8,
       sigma2_epsilon_mu=4,
       sigma2_epsilon_sigma=4,
       sigma2_epsilon_xi=4,
       epsilon_mu=c(0.01,0.01,0.01,0.01,0.01),
       epsilon_sigma=c(0.01,0.01,0.01,0.01,0.01),
       epsilon_xi=c(1.6,1.6,1.6,1.6,1.6))
)



#编译并运行模型
mod<-cmdstan_model("C:/Users/Administrator/Desktop/Rstan代码/BHGEV(1,0,0).stan")
fit <- mod$sample(
  data=data_list,
  chains=2,
  parallel_chains=2,
  #opencl_ids=c(0,0),
  init = init_list,
  iter_sampling=10000,
  iter_warmup=10000,
  seed=123,
  thin=1,
  #threads_per_chain=2,
  sig_figs = 6,
  refresh=100,
  adapt_delta=0.99,
  max_treedepth = 15) #确定抽样方法,提高HMC稳定性

And below are my parameter estimation results:


However, the divergence situation seems to have improved after the correction.But it warns that 19907 of 20000 transitions hit the maximum treedepth limit of 15, should I continue to increase the maximum treedepth?
image
Thank you very much!