Hello Stan modellers,
I am adding myself to the queue for the warnings about divergences and non-centered implementation. The model is based on the skew normal distribution, it is hierarchical and looks like this:
data {
int<lower=0> N; //sample size
vector[N] ratio; //one predictor variable
vector[N] shape; //another predictor variable
int<lower=1, upper=4> N_shore; //labels one categorical variable
int<lower=1, upper=4> N_eco; //labels another categorical variable
int<lower=1, upper=N_shore> shore[N]; //one categorical variable
int<lower=1, upper=N_eco> ref[N]; //another categorical variable
int<lower=0, upper=1> y[N]; //dependent variable
}
parameters {
real<lower=0, upper=1> level;
real<lower=-10, upper=10> preference;
real<lower=0, upper=10> choosiness;
real<lower=-10, upper=10> asymmetry;
vector[N_shore] lambda1;
real<lower=0> lambda1_sd;
vector[N_eco] lambda2;
real<lower=0> lambda2_sd;
}
transformed parameters {
vector[N] scale_logit;
vector<lower=0, upper=1>[N] scale;
vector[N] y_hat;
for (i in 1:N) {
scale_logit[i] = lambda1[shore[i]] + lambda2[ref[i]] * shape[i];
scale[i] = inv_logit(scale_logit[i]);
y_hat[i] = level + scale[i] * exp(-0.5 * ((ratio[i] - preference) / choosiness)^2) * (1 + erf(asymmetry * (ratio[i] - preference) / (1.414214 * choosiness)));
}
}
model {
lambda1 ~ normal(0, lambda1_sd);
lambda1_sd ~ normal(0, 5);
lambda2 ~ normal(0, lambda2_sd);
lambda2_sd ~ normal(0, 5);
y ~ bernoulli_logit(logit(y_hat));
}
The Stan code is run in R using the following commands:
dat = list(N = nrow(CZ_data), y = CZ_data$mountYNcontact, ratio = CZ_data$size_ratio,
shape = CZ_data$shape, N_shore = length(unique(CZ_data$shore)),
N_eco = length(unique(CZ_data$ref_eco_sex)),
shore = CZ_data$shore, ref = CZ_data$ref_eco_sex)
gaus_skew = rstan::stan(file = STANFILE, data = dat, iter = 8000, warmup = 2000,
chains = 4, refresh = 8000,
control = list(adapt_delta = 0.90, max_treedepth = 15))
This model is a simplified version because I am calculating only one hyperparameter in the transformed parameters block (scale). Nevertheless, the remaining parameters (level, preference, choosiness and asymmetry) follow the same logic.
The issue is divergent transitions although R_hat is less than 1.01 (funnel geometry).
Are there any obvious mistakes that can be solved with a re-parametrisation of the model?
Are there other ways to define the parameters in the transformed block?
Any comments or suggestions are warmly welcome because I am quite stuck with this. I am willing to share the data too if somebody considered it useful. Just comment below.