Non-centered hierarchical skew normal implementation

Hello Stan modellers,

I am adding myself to the queue for the warnings about divergences and non-centered implementation. The model is based on the skew normal distribution, it is hierarchical and looks like this:

``````data {
int<lower=0> N;    //sample size
vector[N] ratio;    //one predictor variable
vector[N] shape;    //another predictor variable
int<lower=1, upper=4> N_shore;    //labels one categorical variable
int<lower=1, upper=4> N_eco;    //labels another categorical variable
int<lower=1, upper=N_shore> shore[N];    //one categorical variable
int<lower=1, upper=N_eco> ref[N];    //another categorical variable
int<lower=0, upper=1> y[N];    //dependent variable
}

parameters {
real<lower=0, upper=1> level;
real<lower=-10, upper=10> preference;
real<lower=0, upper=10> choosiness;
real<lower=-10, upper=10> asymmetry;

vector[N_shore] lambda1;
real<lower=0> lambda1_sd;
vector[N_eco] lambda2;
real<lower=0> lambda2_sd;
}

transformed parameters {
vector[N] scale_logit;
vector<lower=0, upper=1>[N] scale;
vector[N] y_hat;
for (i in 1:N) {
scale_logit[i] = lambda1[shore[i]] + lambda2[ref[i]] * shape[i];
scale[i] = inv_logit(scale_logit[i]);
y_hat[i] = level + scale[i] * exp(-0.5 * ((ratio[i] - preference) / choosiness)^2) * (1 + erf(asymmetry * (ratio[i] - preference) / (1.414214 * choosiness)));
}
}

model {
lambda1 ~ normal(0, lambda1_sd);
lambda1_sd ~ normal(0, 5);
lambda2 ~ normal(0, lambda2_sd);
lambda2_sd ~ normal(0, 5);

y ~ bernoulli_logit(logit(y_hat));
}

``````

The Stan code is run in R using the following commands:

``````dat = list(N = nrow(CZ_data), y = CZ_data\$mountYNcontact, ratio = CZ_data\$size_ratio,
shape = CZ_data\$shape, N_shore = length(unique(CZ_data\$shore)),
N_eco = length(unique(CZ_data\$ref_eco_sex)),
shore = CZ_data\$shore, ref = CZ_data\$ref_eco_sex)

gaus_skew = rstan::stan(file = STANFILE, data = dat, iter = 8000, warmup = 2000,
chains = 4, refresh = 8000,
control = list(adapt_delta = 0.90, max_treedepth = 15))
``````

This model is a simplified version because I am calculating only one hyperparameter in the transformed parameters block (scale). Nevertheless, the remaining parameters (level, preference, choosiness and asymmetry) follow the same logic.
The issue is divergent transitions although R_hat is less than 1.01 (funnel geometry).

Are there any obvious mistakes that can be solved with a re-parametrisation of the model?
Are there other ways to define the parameters in the transformed block?
Any comments or suggestions are warmly welcome because I am quite stuck with this. I am willing to share the data too if somebody considered it useful. Just comment below.