Non-centered hierarchical skew normal implementation

Hello Stan modellers,

I am adding myself to the queue for the warnings about divergences and non-centered implementation. The model is based on the skew normal distribution, it is hierarchical and looks like this:


data {
  int<lower=0> N;    //sample size
  vector[N] ratio;    //one predictor variable
  vector[N] shape;    //another predictor variable
  int<lower=1, upper=4> N_shore;    //labels one categorical variable
  int<lower=1, upper=4> N_eco;    //labels another categorical variable
  int<lower=1, upper=N_shore> shore[N];    //one categorical variable
  int<lower=1, upper=N_eco> ref[N];    //another categorical variable
  int<lower=0, upper=1> y[N];    //dependent variable
}

parameters {
  real<lower=0, upper=1> level;
  real<lower=-10, upper=10> preference;
  real<lower=0, upper=10> choosiness;
  real<lower=-10, upper=10> asymmetry;
  
  vector[N_shore] lambda1;
  real<lower=0> lambda1_sd;
  vector[N_eco] lambda2;
  real<lower=0> lambda2_sd;
}

transformed parameters {
  vector[N] scale_logit;
  vector<lower=0, upper=1>[N] scale;
  vector[N] y_hat;
  for (i in 1:N) {
    scale_logit[i] = lambda1[shore[i]] + lambda2[ref[i]] * shape[i];
    scale[i] = inv_logit(scale_logit[i]);
    y_hat[i] = level + scale[i] * exp(-0.5 * ((ratio[i] - preference) / choosiness)^2) * (1 + erf(asymmetry * (ratio[i] - preference) / (1.414214 * choosiness)));
  }
}

model { 
  lambda1 ~ normal(0, lambda1_sd);
  lambda1_sd ~ normal(0, 5);
  lambda2 ~ normal(0, lambda2_sd);
  lambda2_sd ~ normal(0, 5);
  
  y ~ bernoulli_logit(logit(y_hat));
}

The Stan code is run in R using the following commands:

dat = list(N = nrow(CZ_data), y = CZ_data$mountYNcontact, ratio = CZ_data$size_ratio, 
               shape = CZ_data$shape, N_shore = length(unique(CZ_data$shore)),
               N_eco = length(unique(CZ_data$ref_eco_sex)),
               shore = CZ_data$shore, ref = CZ_data$ref_eco_sex)

gaus_skew = rstan::stan(file = STANFILE, data = dat, iter = 8000, warmup = 2000,
                                        chains = 4, refresh = 8000,
                                        control = list(adapt_delta = 0.90, max_treedepth = 15))

This model is a simplified version because I am calculating only one hyperparameter in the transformed parameters block (scale). Nevertheless, the remaining parameters (level, preference, choosiness and asymmetry) follow the same logic.
The issue is divergent transitions although R_hat is less than 1.01 (funnel geometry).

Are there any obvious mistakes that can be solved with a re-parametrisation of the model?
Are there other ways to define the parameters in the transformed block?
Any comments or suggestions are warmly welcome because I am quite stuck with this. I am willing to share the data too if somebody considered it useful. Just comment below.